在日常維護中,如果涉足一些需要重新開機cell的操作,我們如何能在不影響業務的情況下進行這個操作呢,這裡有分以下幾步來完成。
1. 首先需要解釋一個概念,DISK_REPAIR_TIME參數,這個參數是定義在一定時間内,維護磁盤,并将該磁盤offline,是不會觸發ASM的rebalance的,但是我們需要确定我們的維護時間,這個參數預設是3.6小時,如果我們的維護時間更長,需要設定更長的時間周期
(a)登陸ASM執行個體,檢查DISK_REPAIR_TIME的值,語句如下
1. SQL> select dg.name,a.value from v$asm_diskgroup
2. dg, v$asm_attribute a where dg.group_number=a.group_number and a.name='disk_repair_time';
(b)如何根據需要更改時間周期
1. SQL> ALTER DISKGROUP DATA SET ATTRIBUTE 'DISK_REPAIR_TIME'='8.5H';
2) 通過以下指令确認griddisk的狀态,确定該磁盤的鏡像都是正常的,才能offline這個griddisk,不然會造成資料丢失。
該語句傳回‘Yes’,表示這個griddisk可以offline。
1. cellcli -e list griddisk attributes name,asmmodestatus,asmdeactivationoutcome
3) 隻要有一個griddisk傳回asmdeactivationoutcome='No',你都需要等該一段時間,然後再次查詢,知道全部griddisk的狀态正常,才可以繼續操作
如果在狀态異常的情況下,依然offline,将會導緻ASM diskgroup解除安裝,并引起資料庫的異常當機。
4) 執行下面指令Inactivate這個cell上的全部的griddisk, 這個操作大概需要10分鐘或者更長時間。
這一步是非常重要的,在重新開機cell之前一定要保證全部的griddisk成功的offline。Inactivate griddisk會自動的在ASM執行個體中offline相對應的磁盤。
1. cellcli -e alter griddisk all inactive
5) 在確定的griddisks全部offline之後,執行下面的步驟
(a) 在griddisks離線之後,執行下面的指令會看到輸出的狀态為asmmodestatus=OFFLINE, asmmodestatus=UNUSED 和asmdeactivationoutcome=Yes. 隻有這樣狀态,我們才能安全的重新開機cell
1. cellcli -e list griddisk attributes name,asmmodestatus,asmdeactivationoutcome
(b) 檢視griddisk狀态,并确認已經是inactive狀态:
1. cellcli -e list griddisk
6) 現在你可以通過linux指令來重新開機Cell
1. (a) The following command will shut down Oracle Exadata Storage Server immediately: (as root):
2.
3. #shutdown -h now
(當關閉Cell的時候,所有相關的storage服務都會自動停止)
1. (b) The following command will reboot Oracle Exadata Storage Server immediately and force fsck on reboot:
2.
3. #shutdown -F -r now
7) 當cell重新啟動後,你需要手動重新激活griddisks。
1. cellcli -e alter griddisk all active
8) 檢查griddisk是否'active':
1. cellcli -e list griddisk
9) 驗證grid disk狀态:
(a) 驗證所有的grid disks已經成功online:
1. cellcli -e list griddisk attributes name, asmmodestatus
(b) 檢視狀态發現,cell是有個'SYNCING'的狀态,等全部同步完成,才能變成‘online’,等到griddisk的asmmodestatus屬性全都‘online’。
1. Wait until asmmodestatus is ONLINE for all grid disks. Each disk will go to a 'SYNCING' state first then 'ONLINE'. The following is an example of the output:
2. DATA_CD_00_dm01cel01 ONLINE
3. DATA_CD_01_dm01cel01 SYNCING
4. DATA_CD_02_dm01cel01 OFFLINE
5. DATA_CD_03_dm01cel01 OFFLINE
6. DATA_CD_04_dm01cel01 OFFLINE
7. DATA_CD_05_dm01cel01 OFFLINE
8. DATA_CD_06_dm01cel01 OFFLINE
9. DATA_CD_07_dm01cel01 OFFLINE
10. DATA_CD_08_dm01cel01 OFFLINE
11. DATA_CD_09_dm01cel01 OFFLINE
12. DATA_CD_10_dm01cel01 OFFLINE
13. DATA_CD_11_dm01cel01 OFFLINE
(c) 等全部griddisk的asmmodestatus屬性全都‘online’, Oracle ASM 同步才算完成
( Please note: this operation uses Fast Mirror Resync operation - whichdoes not trigger an ASM rebalance. The Resync operation restores only theextents that would have been written while the disk was offline.)
9) 在操作另一個cell并offline之前,已經要確定之前的cell已經同步完成。 如果之前的cell同步沒有完成,那麼執行另一個cell的檢查操作會失敗,下面是一個錯誤的輸出
1. CellCLI> list griddisk attributes name where asmdeactivationoutcome != 'Yes'
2. DATA_CD_00_dm01cel02 "Cannot de-activate due to other offline disks in the diskgroup"
3. DATA_CD_01_dm01cel02 "Cannot de-activate due to other offline disks in the diskgroup"
4. DATA_CD_02_dm01cel02 "Cannot de-activate due to other offline disks in the diskgroup"
5. DATA_CD_03_dm01cel02 "Cannot de-activate due to other offline disks in the diskgroup"
6. DATA_CD_04_dm01cel02 "Cannot de-activate due to other offline disks in the diskgroup"
7. DATA_CD_05_dm01cel02 "Cannot de-activate due to other offline disks in the diskgroup"
8. DATA_CD_06_dm01cel02 "Cannot de-activate due to other offline disks in the diskgroup"
9. DATA_CD_07_dm01cel02 "Cannot de-activate due to other offline disks in the diskgroup"
10. DATA_CD_08_dm01cel02 "Cannot de-activate due to other offline disks in the diskgroup"
11. DATA_CD_09_dm01cel02 "Cannot de-activate due to other offline disks in the diskgroup"
12. DATA_CD_10_dm01cel02 "Cannot de-activate due to other offline disks in the diskgroup"
13. DATA_CD_11_dm01cel02 "Cannot de-activate due to other offline disks in the diskgroup"