在日常维护中,如果涉足一些需要重启cell的操作,我们如何能在不影响业务的情况下进行这个操作呢,这里有分以下几步来完成。
1. 首先需要解释一个概念,DISK_REPAIR_TIME参数,这个参数是定义在一定时间内,维护磁盘,并将该磁盘offline,是不会触发ASM的rebalance的,但是我们需要确定我们的维护时间,这个参数默认是3.6小时,如果我们的维护时间更长,需要设定更长的时间周期
(a)登陆ASM实例,检查DISK_REPAIR_TIME的值,语句如下
1. SQL> select dg.name,a.value from v$asm_diskgroup
2. dg, v$asm_attribute a where dg.group_number=a.group_number and a.name='disk_repair_time';
(b)如何根据需要更改时间周期
1. SQL> ALTER DISKGROUP DATA SET ATTRIBUTE 'DISK_REPAIR_TIME'='8.5H';
2) 通过以下命令确认griddisk的状态,确定该磁盘的镜像都是正常的,才能offline这个griddisk,不然会造成数据丢失。
该语句返回‘Yes’,表示这个griddisk可以offline。
1. cellcli -e list griddisk attributes name,asmmodestatus,asmdeactivationoutcome
3) 只要有一个griddisk返回asmdeactivationoutcome='No',你都需要等该一段时间,然后再次查询,知道全部griddisk的状态正常,才可以继续操作
如果在状态异常的情况下,依然offline,将会导致ASM diskgroup卸载,并引起数据库的异常宕机。
4) 执行下面命令Inactivate这个cell上的全部的griddisk, 这个操作大概需要10分钟或者更长时间。
这一步是非常重要的,在重启cell之前一定要保证全部的griddisk成功的offline。Inactivate griddisk会自动的在ASM实例中offline相对应的磁盘。
1. cellcli -e alter griddisk all inactive
5) 在确保的griddisks全部offline之后,执行下面的步骤
(a) 在griddisks离线之后,执行下面的命令会看到输出的状态为asmmodestatus=OFFLINE, asmmodestatus=UNUSED 和asmdeactivationoutcome=Yes. 只有这样状态,我们才能安全的重启cell
1. cellcli -e list griddisk attributes name,asmmodestatus,asmdeactivationoutcome
(b) 查看griddisk状态,并确认已经是inactive状态:
1. cellcli -e list griddisk
6) 现在你可以通过linux命令来重启Cell
1. (a) The following command will shut down Oracle Exadata Storage Server immediately: (as root):
2.
3. #shutdown -h now
(当关闭Cell的时候,所有相关的storage服务都会自动停止)
1. (b) The following command will reboot Oracle Exadata Storage Server immediately and force fsck on reboot:
2.
3. #shutdown -F -r now
7) 当cell重新启动后,你需要手动重新激活griddisks。
1. cellcli -e alter griddisk all active
8) 检查griddisk是否'active':
1. cellcli -e list griddisk
9) 验证grid disk状态:
(a) 验证所有的grid disks已经成功online:
1. cellcli -e list griddisk attributes name, asmmodestatus
(b) 查看状态发现,cell是有个'SYNCING'的状态,等全部同步完成,才能变成‘online’,等到griddisk的asmmodestatus属性全都‘online’。
1. Wait until asmmodestatus is ONLINE for all grid disks. Each disk will go to a 'SYNCING' state first then 'ONLINE'. The following is an example of the output:
2. DATA_CD_00_dm01cel01 ONLINE
3. DATA_CD_01_dm01cel01 SYNCING
4. DATA_CD_02_dm01cel01 OFFLINE
5. DATA_CD_03_dm01cel01 OFFLINE
6. DATA_CD_04_dm01cel01 OFFLINE
7. DATA_CD_05_dm01cel01 OFFLINE
8. DATA_CD_06_dm01cel01 OFFLINE
9. DATA_CD_07_dm01cel01 OFFLINE
10. DATA_CD_08_dm01cel01 OFFLINE
11. DATA_CD_09_dm01cel01 OFFLINE
12. DATA_CD_10_dm01cel01 OFFLINE
13. DATA_CD_11_dm01cel01 OFFLINE
(c) 等全部griddisk的asmmodestatus属性全都‘online’, Oracle ASM 同步才算完成
( Please note: this operation uses Fast Mirror Resync operation - whichdoes not trigger an ASM rebalance. The Resync operation restores only theextents that would have been written while the disk was offline.)
9) 在操作另一个cell并offline之前,已经要确保之前的cell已经同步完成。 如果之前的cell同步没有完成,那么执行另一个cell的检查操作会失败,下面是一个错误的输出
1. CellCLI> list griddisk attributes name where asmdeactivationoutcome != 'Yes'
2. DATA_CD_00_dm01cel02 "Cannot de-activate due to other offline disks in the diskgroup"
3. DATA_CD_01_dm01cel02 "Cannot de-activate due to other offline disks in the diskgroup"
4. DATA_CD_02_dm01cel02 "Cannot de-activate due to other offline disks in the diskgroup"
5. DATA_CD_03_dm01cel02 "Cannot de-activate due to other offline disks in the diskgroup"
6. DATA_CD_04_dm01cel02 "Cannot de-activate due to other offline disks in the diskgroup"
7. DATA_CD_05_dm01cel02 "Cannot de-activate due to other offline disks in the diskgroup"
8. DATA_CD_06_dm01cel02 "Cannot de-activate due to other offline disks in the diskgroup"
9. DATA_CD_07_dm01cel02 "Cannot de-activate due to other offline disks in the diskgroup"
10. DATA_CD_08_dm01cel02 "Cannot de-activate due to other offline disks in the diskgroup"
11. DATA_CD_09_dm01cel02 "Cannot de-activate due to other offline disks in the diskgroup"
12. DATA_CD_10_dm01cel02 "Cannot de-activate due to other offline disks in the diskgroup"
13. DATA_CD_11_dm01cel02 "Cannot de-activate due to other offline disks in the diskgroup"