天天看点

Ceph:ceph修复osd为down的情况

ceph修复osd为down的情况

今天巡检发现ceph集群有一个osds Down了

通过dashboard 查看:

ceph修复osd为down的情况:

Ceph:ceph修复osd为down的情况

点击查看详情

可以看到是哪个节点Osds Down 了

Ceph:ceph修复osd为down的情况

通过命令查看Osds状态

①、查看集群状态:

[[email protected] ~]# ceph -s
  cluster:
    id:     240a5732-02e5-11eb-8f5a-000c2945a4b1
    health: HEALTH_WARN
            Degraded data redundancy: 3972/11916 objects degraded (33.333%), 64 pgs degraded, 65 pgs undersized
            65 pgs not deep-scrubbed in time
            65 pgs not scrubbed in time

  services:
    mon: 3 daemons, quorum ceph01,ceph02,ceph03 (age 8d)
    mgr: ceph02.zopypt(active, since 10w), standbys: ceph03.ucynxg, ceph01.suwmox
    mds: cephfs:1 {0=cephfs.ceph02.axdsbo=up:active} 4 up:standby
    osd: 3 osds: 2 up (since 5w), 2 in (since 5w)

  data:
    pools:   3 pools, 65 pgs
    objects: 3.97k objects, 1.8 GiB
    usage:   6.0 GiB used, 2.0 TiB / 2.0 TiB avail
    pgs:     3972/11916 objects degraded (33.333%)
             64 active+undersized+degraded
             1  active+undersized

  io:
    client:   596 B/s wr, 0 op/s rd, 0 op/s wr
           

②、查看Osds树状态

[[email protected] ~]# ceph osd tree
ID  CLASS  WEIGHT   TYPE NAME             STATUS  REWEIGHT  PRI-AFF
-1         3.00000  root default
-3         1.00000      host sjyt-ceph01
 0    hdd  1.00000          osd.0           down         0  1.00000
-5         1.00000      host sjyt-ceph02
 1    hdd  1.00000          osd.1             up   1.00000  1.00000
-7         1.00000      host sjyt-ceph03
 2    hdd  1.00000          osd.2             up   1.00000  1.00000
           

解决过程:

另一种处理方式:

参考:ceph修复osd为down的情况

①、重启故障节点osd服务

[[email protected] ~]# systemctl status [email protected]
● ceph-[email protected] - Ceph osd.0 for 240a5732-02e5-11eb-8f5a-000c2945a4b1
   Loaded: loaded (/etc/systemd/system/[email protected]; enabled; vendor preset: disabled)
   Active: inactive (dead) since Mon 2021-02-01 19:24:37 CST; 1 months 5 days ago
  Process: 320045 ExecStopPost=/bin/bash /var/lib/ceph/240a5732-02e5-11eb-8f5a-000c2945a4b1/osd.0/unit.poststop (code=exited, status=0/SUCCESS)
  Process: 320033 ExecStop=/bin/podman stop ceph-240a5732-02e5-11eb-8f5a-000c2945a4b1-osd.0 (code=exited, status=125)
  Process: 153844 ExecStart=/bin/bash /var/lib/ceph/240a5732-02e5-11eb-8f5a-000c2945a4b1/osd.0/unit.run (code=exited, status=0/SUCCESS)
  Process: 153833 ExecStartPre=/bin/podman rm ceph-240a5732-02e5-11eb-8f5a-000c2945a4b1-osd.0 (code=exited, status=1/FAILURE)
 Main PID: 153844 (code=exited, status=0/SUCCESS)

Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.
[[email protected] ~]# systemctl start [email protected]
[[email protected] ~]# systemctl status [email protected]
● ceph-[email protected] - Ceph osd.0 for 240a5732-02e5-11eb-8f5a-000c2945a4b1
   Loaded: loaded (/etc/systemd/system/[email protected]; enabled; vendor preset: disabled)
   Active: active (running) since Tue 2021-03-09 10:19:07 CST; 1s ago
  Process: 320045 ExecStopPost=/bin/bash /var/lib/ceph/240a5732-02e5-11eb-8f5a-000c2945a4b1/osd.0/unit.poststop (code=exited, status=0/SUCCESS)
  Process: 320033 ExecStop=/bin/podman stop ceph-240a5732-02e5-11eb-8f5a-000c2945a4b1-osd.0 (code=exited, status=125)
  Process: 2770303 ExecStartPre=/bin/podman rm ceph-240a5732-02e5-11eb-8f5a-000c2945a4b1-osd.0 (code=exited, status=1/FAILURE)
 Main PID: 2770314 (bash)
    Tasks: 13 (limit: 23968)
   Memory: 31.2M
   CGroup: /system.slice/system-ceph\x2d240a5732\x2d02e5\x2d11eb\x2d8f5a\x2d000c2945a4b1.slice/[email protected]
           ���─2770314 /bin/bash /var/lib/ceph/240a5732-02e5-11eb-8f5a-000c2945a4b1/osd.0/unit.run
           └─2770413 /bin/podman run --rm --net=host --ipc=host --privileged --group-add=disk --name ceph-240a5732-02e5-11eb-8f5a-000c2945a4b1-osd.0 -e CONTAINER_IMAGE=docker.io/ceph/ceph:v15 -e NODE_NAME=sjyt
           

②、查看OSD状态

[[email protected] ~]# ceph osd tree
ID  CLASS  WEIGHT   TYPE NAME             STATUS  REWEIGHT  PRI-AFF
-1         3.00000  root default
-3         1.00000      host sjyt-ceph01
 0    hdd  1.00000          osd.0             up   1.00000  1.00000
-5         1.00000      host sjyt-ceph02
 1    hdd  1.00000          osd.1             up   1.00000  1.00000
-7         1.00000      host sjyt-ceph03
 2    hdd  1.00000          osd.2             up   1.00000  1.00000
           

③、查看集群状态

[[email protected] ~]# ceph -s
  cluster:
    id:     240a5732-02e5-11eb-8f5a-000c2945a4b1
    health: HEALTH_WARN
            Degraded data redundancy: 2654/11916 objects degraded (22.273%), 39 pgs degraded, 39 pgs undersized
            64 pgs not deep-scrubbed in time
            64 pgs not scrubbed in time

  services:
    mon: 3 daemons, quorum sjyt-ceph01,sjyt-ceph02,sjyt-ceph03 (age 8d)
    mgr: sjyt-ceph02.zopypt(active, since 10w), standbys: sjyt-ceph03.ucynxg, sjyt-ceph01.suwmox
    mds: cephfs:1 {0=cephfs.sjyt-ceph02.axdsbo=up:active} 4 up:standby
    osd: 3 osds: 3 up (since 8m), 3 in (since 8m); 39 remapped pgs

  data:
    pools:   3 pools, 65 pgs
    objects: 3.97k objects, 1.8 GiB
    usage:   9.4 GiB used, 3.0 TiB / 3.0 TiB avail
    pgs:     1.538% pgs not active
             2654/11916 objects degraded (22.273%)
             38 active+undersized+degraded+remapped+backfill_wait
             25 active+clean
             1  active+undersized+degraded+remapped+backfilling
             1  peering

  io:
    client:   1.5 KiB/s wr, 0 op/s rd, 0 op/s wr
    recovery: 2.7 MiB/s, 1 keys/s, 1 objects/s
           

Osds 恢复正常后,数据开始恢复到新的Osds节点上。