天天看点

oracle 11g RAC ASM磁盘被强制下线抢修一例

又有一段时间没写有关oracle的文章了,恐怕这也是不能成为高手的原因之一,偶有典型性拖延症和懒癌。

今天要讲的一个栗子又一个现场case,中午吃饭的时候看到来自同事的未接,处于职业敏感感觉是数据中心出问题了,拨过去第一通电话告诉我存储坏掉一块盘,然后NBU有个备份作业没完成,感觉问题不大通知厂商维修便可以了。刚吃完饭,第二通电话进来了,这次告知我又一个双活控制器存储报警,同时说数据库RAC集群看不到状态了。心中暗暗祈祷不要被近期“比特币勒索”事件波及,上周二刚刚通知各位厂家技术人员排查过,可别害我啊,这玩意儿我真修复不了。

原本打算中午休息,忍了赶快处理吧,下午还有3节c#的实验课呢,周末不太平啊。赶到现场,如我猜想的一样,同事赶着中午在巡检,询问最近的巡检情况,同事告知我周六没有做巡检,周一到周五都做了,没发现问题。打开终端连到db server上,果然是有问题:

[12:59:29][root: ~]#ckrac   自己写个一个小脚本
[12:59:31]CRS-4535: Cannot communicate with Cluster Ready Services
[12:59:31]CRS-4000: Command Status failed, or completed with errors.
           

首先第一印象判断:集群垮了

马上检查数据库

su - oracle
sqlplus / as sysdba
select * from v$instance;
           

还好,实例还活着,数据算是基本上不会丢了。

尝试启动集群

13:07:48][root: /u01/app/11.2.0/grid/bin]#./crsctl start cluster
[::]CRS-: Attempting to start 'ora.crsd' on 'rac01'
[::]CRS-: Start of 'ora.crsd' on 'rac01' succeeded
           

接下就是检查日志了,先检查节点实例的日志,没看出问题

[13:13:33][root: /u01/app/11.2.0/grid/bin]# cd /u01/app/oracle/diag/rdbms/gbf1/GBF11/trace
[13:13:52][root: /u01/app/oracle/diag/rdbms/gbf1/GBF11/trace]#tail -500  alert_GBF11.log |more
           

有点懵了,确定是集群的问题,这会是什么问题呢?

查看集群日志:没做功课和笔记下场,只有find一下了

[13:19:33][root: ~]#find /u01 -name 'crsd.log'
[13:19:45]/u01/app/11.2.0/grid/log/rac01/crsd/crsd.log

[13:20:42]2016-11-20 13:23:50.303: [    CRSD][3134007024] Logging level for Module: COMMCRS  0
[13:20:42]2016-11-20 13:23:50.303: [    CRSD][3134007024] Logging level for Module: COMMNS  0
[13:20:42]2016-11-20 13:23:50.303: [    CRSD][3134007024] Logging level for Module: CSSCLNT  0
[13:20:42]2016-11-20 13:23:50.303: [    CRSD][3134007024] Logging level for Module: GIPCLIB  0
[13:20:42]2016-11-20 13:23:50.303: [    CRSD][3134007024] Logging level for Module: GIPCXBAD  0
[13:20:42]2016-11-20 13:23:50.303: [    CRSD][3134007024] Logging level for Module: GIPCLXPT  0
[13:20:42]2016-11-20 13:23:50.303: [    CRSD][3134007024] Logging level for Module: GIPCUNDE  0
[13:20:42]2016-11-20 13:23:50.303: [    CRSD][3134007024] Logging level for Module: GIPC  0
[13:20:42]2016-11-20 13:23:50.303: [    CRSD][3134007024] Logging level for Module: GIPCGEN  0
[13:20:42]2016-11-20 13:23:50.303: [    CRSD][3134007024] Logging level for Module: GIPCTRAC  0
[13:20:42]2016-11-20 13:23:50.303: [    CRSD][3134007024] Logging level for Module: GIPCWAIT  0
[13:20:42]2016-11-20 13:23:50.303: [    CRSD][3134007024] Logging level for Module: GIPCXCPT  0
[13:20:42]2016-11-20 13:23:50.303: [    CRSD][3134007024] Logging level for Module: GIPCOSD  0
[13:20:42]2016-11-20 13:23:50.303: [    CRSD][3134007024] Logging level for Module: GIPCBASE  0
[13:20:42]2016-11-20 13:23:50.303: [    CRSD][3134007024] Logging level for Module: GIPCCLSA  0
[13:20:42]2016-11-20 13:23:50.303: [    CRSD][3134007024] Logging level for Module: GIPCCLSC  0
[13:20:42]2016-11-20 13:23:50.303: [    CRSD][3134007024] Logging level for Module: GIPCEXMP  0
[13:20:42]2016-11-20 13:23:50.303: [    CRSD][3134007024] Logging level for Module: GIPCGMOD  0
[13:20:42]2016-11-20 13:23:50.303: [    CRSD][3134007024] Logging level for Module: GIPCHEAD  0
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: GIPCMUX  0
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: GIPCNET  0
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: GIPCNULL  0
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: GIPCPKT  0
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: GIPCSMEM  0
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: GIPCHAUP  0
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: GIPCHALO  0
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: GIPCHTHR  0
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: GIPCHGEN  0
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: GIPCHLCK  0
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: GIPCHDEM  0
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: GIPCHWRK  0
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: CRSMAIN  0
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: clsdmt  0
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: clsdms  0
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: CRSUI  0
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: CRSCOMM  0
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: CRSRTI  0
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: CRSPLACE  0
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: CRSAPP  0
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: CRSRES  0
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: CRSTIMER  0
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: CRSEVT  0
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: CRSD  0
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: CLUCLS  0
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: CLSVER  0
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: CLSFRAME  0
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: CRSPE  0
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: CRSSE  0
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: CRSRPT  0
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: CRSOCR  0
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: UiServer  0
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: AGFW  0
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: SuiteTes  1
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: CRSSHARE  1
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: CRSSEC  0
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: CRSCCL  0
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: CRSCEVT  0
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: AGENT  1
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: OCRAPI  1
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: OCRCLI  1
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: OCRSRV  1
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: OCRMAS  1
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: OCRMSG  1
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: OCRCAC  1
[13:20:42]2016-11-20 13:23:50.304: [    CRSD][3134007024] Logging level for Module: OCRRAW  1
[13:20:42]2016-11-20 13:23:50.305: [    CRSD][3134007024] Logging level for Module: OCRUTL  1
[13:20:42]2016-11-20 13:23:50.305: [    CRSD][3134007024] Logging level for Module: OCROSD  1
[13:20:42]2016-11-20 13:23:50.305: [    CRSD][3134007024] Logging level for Module: OCRASM  1
[13:20:42]2016-11-20 13:23:50.305: [ CRSMAIN][3134007024] Checking the OCR device
[13:20:42]2016-11-20 13:23:50.305: [ CRSMAIN][3134007024] Sync-up with OCR
[13:20:42]2016-11-20 13:23:50.305: [ CRSMAIN][3134007024] Connecting to the CSS Daemon
[13:20:42]2016-11-20 13:23:50.305: [ CRSMAIN][3134007024] Getting local node number
[13:20:42]2016-11-20 13:23:50.306: [ CRSMAIN][3134007024] Initializing OCR
[13:20:42]2016-11-20 13:23:50.307: [ CRSMAIN][3127560512] Policy Engine is not initialized yet!
[13:20:42][   CLWAL][3134007024]clsw_Initialize: OLR initlevel [70000]
[13:20:42]2016-11-20 13:23:50.622: [  OCRASM]
****[3134007024]proprasmo: Error in open/create file in dg [OCRVDISK]**** 
[13:20:42][  OCRASM][3134007024]SLOS : SLOS: cat=8, opn=kgfoOpen01, dep=15056, loc=kgfokge
[13:20:42]
[13:20:42]2016-11-20 13:23:50.623: [  OCRASM][3134007024]ASM Error Stack : 
[13:20:42]2016-11-20 13:23:50.667: [  OCRASM][3134007024]proprasmo: kgfoCheckMount returned [6]
[13:20:42]2016-11-20 13:23:50.667: [  OCRASM][3134007024]proprasmo: The ASM disk group OCRVDISK is not found or not mounted
[13:20:42]2016-11-20 13:23:50.667: [  OCRRAW][3134007024]proprioo: Failed to open [+OCRVDISK]. Returned proprasmo() with [26]. Marking location as UNAVAILABLE.
[13:20:42]2016-11-20 13:23:50.667: [  OCRRAW][3134007024]proprioo: No OCR/OLR devices are usable
[13:20:42]2016-11-20 13:23:50.667: [  OCRASM][3134007024]proprasmcl: asmhandle is NULL
[13:20:42]2016-11-20 13:23:50.668: [    GIPC][3134007024] gipcCheckInitialization: possible incompatible non-threaded init from [prom.c : 690], original from [clsss.c : 5343]
[13:20:42]2016-11-20 13:23:50.668: [ default][3134007024]clsvactversion:4: Retrieving Active Version from local storage.
[13:20:42]2016-11-20 13:23:50.670: [ CSSCLNT][3134007024]clssgsgrppubdata: group (ocr_rac-cluster01) not found
[13:20:42]
[13:20:42]2016-11-20 13:23:50.670: [  OCRRAW][3134007024]proprio_repairconf: Failed to retrieve the group public data. CSS ret code [20]
[13:20:42]2016-11-20 13:23:50.672: [  OCRRAW][3134007024]proprioo: Failed to auto repair the OCR configuration.
[13:20:42]2016-11-20 13:23:50.672: [  OCRRAW][3134007024]proprinit: Could not open raw device 
[13:20:42]2016-11-20 13:23:50.672: [  OCRASM][3134007024]proprasmcl: asmhandle is NULL
[13:20:42]2016-11-20 13:23:50.675: [  OCRAPI][3134007024]a_init:16!: Backend init unsuccessful : [26]
[13:20:42]2016-11-20 13:23:50.675: [  CRSOCR][3134007024] OCR context init failure.  Error: PROC-26: Error while accessing the physical storage
[13:20:42]
[13:20:42]2016-11-20 13:23:50.675: [    CRSD][3134007024] Created alert : (:CRSD00111:) :  Could not init OCR, error: PROC-26: Error while accessing the physical storage
[13:20:42]
[13:20:42]2016-11-20 13:23:50.675: [    CRSD][3134007024][PANIC] CRSD exiting: Could not init OCR, code: 26
[13:20:42]2016-11-20 13:23:50.675: [    CRSD][3134007024] Done.
           

可以看出[+OCRVDISK]. 这个盘有问题

su - grid
sqlplus / as sysasm
SQL> select GROUP_NUMBER,NAME,TYPE,ALLOCATION_UNIT_SIZE,STATE from v$asm_diskgroup;

GROUP_NUMBER NAME  TYPE         ALLOCATION_UNIT_SIZE STATETYPE         ALLOCATION_UNIT_SIZE STATE
------------ ------------------------------------------------------------------------ -------------------- ----------------------
            BAK01  EXTERN                     MOUNTED
            DATA01 EXTERN                     MOUNTED
            GBF1_ARC EXTERN                     MOUNTED
            GBF2_ARC EXTERN                     MOUNTED
            OCRVDISK                                DISMOUNTED
            YWDB_ARC EXTERN                     MOUNTED
 rows selected.
alter diskgroup OCRVDISK mount;
           

上面的日志进一步印证该问题,这个盘绝对有问题!

查看ASM日志

cd /u01/app/grid/diag/asm/+asm/+ASM1/trace
#tail - alert_+ASM1.log 
[::]Thu Nov  :: 
[::]WARNING: dirty detached from domain 
[::]NOTE: cache dismounted group / (OCRVDISK) 
[::]SQL> alter diskgroup OCRVDISK dismount force /* ASM SERVER:2886369426 */   <-- 看这里
[::]Thu Nov  :: 
[::]NOTE: cache deleting context for group OCRVDISK /
[::]GMON dismounting group  at  for pid , osid 
[::]NOTE: Disk OCRVDISK_0000 in mode  marked for de-assignment
[::]NOTE: Disk OCRVDISK_0001 in mode  marked for de-assignment
[::]NOTE: Disk OCRVDISK_0002 in mode  marked for de-assignment
[::]SUCCESS: diskgroup OCRVDISK was dismounted
[::]SUCCESS: alter diskgroup OCRVDISK dismount force /* ASM SERVER:2886369426 */
[::]SUCCESS: ASM-initiated MANDATORY DISMOUNT of group OCRVDISK
[::]Thu Nov  :: 
[::]NOTE: diskgroup resource ora.OCRVDISK.dg is offline
[::]ASM Health Checker found  new failures
[::]Thu Nov  :: 
[::]Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_27147.trc:
[::]ORA-: ASM diskgroup was forcibly dismounted
[::]Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_27147.trc:
[::]ORA-: ASM diskgroup was forcibly dismounted
[::]Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_27147.trc:
[::]ORA-: ASM diskgroup was forcibly dismounted
[::]Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_27147.trc:
[::]ORA-: ASM diskgroup was forcibly dismounted
[::]Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_27147.trc:
[::]ORA-: ASM diskgroup was forcibly dismounted
[::]Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_27147.trc:
[::]ORA-: ASM diskgroup was forcibly dismounted
[::]Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_27147.trc:
[::]ORA-: ASM diskgroup was forcibly dismounted
[::]Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_27147.trc:
[::]ORA-: ASM diskgroup was forcibly dismounted
[::]Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_27147.trc:
[::]ORA-: ASM diskgroup was forcibly dismounted
[::]Thu Nov  :: 
[::]Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_27147.trc:
[::]ORA-: ASM diskgroup was forcibly dismounted
[::]WARNING: requested mirror side  of virtual extent  logical extent  offset  is not allocated; I/O request failed
[::]WARNING: requested mirror side  of virtual extent  logical extent  offset  is not allocated; I/O request failed
[::]Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_27147.trc:
<-- 通过此文件进一步追踪
[::]ORA-: ASM diskgroup was forcibly dismounted
[::]ORA-: ASM diskgroup was forcibly dismounted
[::]Thu Nov  :: 
[::]SQL> alter diskgroup OCRVDISK check /* proxy */ 
[::]ORA-: not all alterations performed
[::]ORA-: diskgroup "OCRVDISK" does not exist or is not mounted
[::]ERROR: alter diskgroup OCRVDISK check /* proxy */
[::]Thu Nov  :: 
[::]NOTE: client exited []
[::]Thu Nov  :: 
[::]NOTE: [crsd.bin@rac01 (TNS V1-V3) ] opening OCR file
[::]Thu Nov  :: 
[::]NOTE: [crsd.bin@rac01 (TNS V1-V3) ] opening OCR file
[::]Thu Nov  :: 
[::]NOTE: [crsd.bin@rac01 (TNS V1-V3) ] opening OCR file
[::]Thu Nov  :: 
[::]NOTE: [crsd.bin@rac01 (TNS V1-V3) ] opening OCR file
[::]Thu Nov  :: 
[::]NOTE: [crsd.bin@rac01 (TNS V1-V3) ] opening OCR file
[::]Thu Nov  :: 
[::]NOTE: [crsd.bin@rac01 (TNS V1-V3) ] opening OCR file
[::]Thu Nov  :: 
[::]NOTE: [crsd.bin@rac01 (TNS V1-V3) ] opening OCR file
[::]Thu Nov  :: 
[::]NOTE: [crsd.bin@rac01 (TNS V1-V3) ] opening OCR file
[::]Thu Nov  :: 
[::]NOTE: [crsd.bin@rac01 (TNS V1-V3) ] opening OCR file
[::]Thu Nov  :: 
[::]NOTE: [crsd.bin@rac01 (TNS V1-V3) ] opening OCR file
[::]Sun Nov  :: 
[::]NOTE: [crsd.bin@rac01 (TNS V1-V3) ] opening OCR file
[::]Sun Nov  :: 
[::]NOTE: [crsd.bin@rac01 (TNS V1-V3) ] opening OCR file
[::]Sun Nov  :: 
[::]NOTE: [crsd.bin@rac01 (TNS V1-V3) ] opening OCR file
[::]Sun Nov  :: 
[::]NOTE: [crsd.bin@rac01 (TNS V1-V3) ] opening OCR file
[::]Sun Nov  :: 
[::]NOTE: [crsd.bin@rac01 (TNS V1-V3) ] opening OCR file
[::]Sun Nov  :: 
[::]NOTE: [crsd.bin@rac01 (TNS V1-V3) ] opening OCR file
[::]Sun Nov  :: 
[::]NOTE: [crsd.bin@rac01 (TNS V1-V3) ] opening OCR file
[::]Sun Nov  :: 
[::]NOTE: [crsd.bin@rac01 (TNS V1-V3) ] opening OCR file
[::]Sun Nov  :: 
[::]NOTE: [crsd.bin@rac01 (TNS V1-V3) ] opening OCR file
[::]Sun Nov  :: 
[::]NOTE: [crsd.bin@rac01 (TNS V1-V3) ] opening OCR file
[::]Sun Nov  :: 
[::]NOTE: [crsd.bin@rac01 (TNS V1-V3) ] opening OCR file
[::]Sun Nov  :: 
[::]NOTE: [ocrcheck.bin@rac01 (TNS V1-V3) ] opening OCR file
[::]Sun Nov  :: 
[::]NOTE: No asm libraries found in the system
[::]MEMORY_TARGET defaulting to 
[::]* instance_number obtained from CSS = , checking for the existence of node .. 
[::]* node  does not exist. instance_number =  
[::]Starting ORACLE instance (normal)
           

进一步追踪ASM1_ora_27147.trc文件

#tail -  +ASM1_ora_27147.trc
[::]Trace file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_27147.trc
[::]Oracle Database g Enterprise Edition Release .. - bit Production
[::]With the Real Application Clusters and Automatic Storage Management options
[::]ORACLE_HOME = /u01/app/./grid
[::]System name:    Linux
[::]Node name:      rac01
[::]Release:        .-..el5uek
[::]Version:        #1 SMP Thu Jan  :: PST 
[::]Machine:        x86_64
[::]VM name:        VMWare Version: 
[::]Instance name: +ASM1
[::]Redo thread mounted by this instance:  <none>
[::]Oracle process number: 
[::]Unix process pid: , image: [email protected] (TNS V1-V3)
[::]
[::]
[::]*** -- ::
[::]*** SESSION ID:() -- ::
[::]*** CLIENT ID:() -- ::
[::]*** SERVICE NAME:() -- ::
[::]*** MODULE NAME:([email protected] (TNS V1-V3)) -- ::
[::]*** ACTION NAME:() -- ::
[::] 
[::]WARNING:failed xlate  
[::]ORA-: ASM diskgroup was forcibly dismounted
[::]WARNING:failed xlate  
[::]ORA-: ASM diskgroup was forcibly dismounted
[::]WARNING:failed xlate  
[::]ORA-: ASM diskgroup was forcibly dismounted
[::]WARNING:failed xlate  
[::]ORA-: ASM diskgroup was forcibly dismounted
[::]WARNING:failed xlate  
[::]ORA-: ASM diskgroup was forcibly dismounted
[::]WARNING:failed xlate  
[::]ORA-: ASM diskgroup was forcibly dismounted
[::]WARNING:failed xlate  
[::]ORA-: ASM diskgroup was forcibly dismounted
[::]WARNING:failed xlate  
[::]ORA-: ASM diskgroup was forcibly dismounted
[::]WARNING:failed xlate  
[::]ORA-: ASM diskgroup was forcibly dismounted
[::]
[::]*** -- ::
[::]Received ORADEBUG command (#1) 'CLEANUP_KFK_FD' from process 'Unix process pid: 20811, image: <none>'
[::]
[::]*** -- ::
[::]Finished processing ORADEBUG command (#1) 'CLEANUP_KFK_FD'
[::]
[::]*** -- ::
[::]WARNING:failed xlate  
[::]ORA-: ASM diskgroup was forcibly dismounted
[::]ksfdrfms:Mirror Read file=+OCRVDISK. fob=x9d804848 bufp=x7f0543d2fa00 blkno= nbytes=
[::]WARNING:failed xlate  
[::]WARNING: requested mirror side  of virtual extent  logical extent  offset  is not allocated; I/O request failed
[::]ksfdrfms:Read failed from mirror side= logical extent number= dskno=
[::]WARNING:failed xlate  
[::]WARNING: requested mirror side  of virtual extent  logical extent  offset  is not allocated; I/O request failed
[::]ksfdrfms:Read failed from mirror side= logical extent number= dskno=
[::]ORA-: ASM diskgroup was forcibly dismounted
[::]ORA-: ASM diskgroup was forcibly dismounted
           

追踪#tail -500 +ASM1_gmon_27098.trc文件

#tail -500  +ASM1_gmon_27098.trc
[::]Trace file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_gmon_27098.trc
[::]Oracle Database g Enterprise Edition Release  - bit Production
[::]With the Real Application Clusters and Automatic Storage Management options
[::]ORACLE_HOME = /u01/app//grid
[::]System name:    Linux
[::]Node name:      rac01
[::]Release:        -.el5uek
[::]Version:        #1 SMP Thu Jan 3 18:31:38 PST 2013
[::]Machine:        x86_64
[::]VM name:        VMWare Version: 
[::]Instance name: +ASM1
[::]Redo thread mounted by this instance:  <none>
[::]Oracle process number: 
[::]Unix process pid: , image: [email protected] (GMON)
[::]
[::]
[::]*** -- ::
[::]*** SESSION ID:() -- ::
[::]*** CLIENT ID:() -- ::
[::]*** SERVICE NAME:() -- ::
[::]*** MODULE NAME:() -- ::
[::]*** ACTION NAME:() -- ::
[::] 
[::]
[::]*** TRACE FILE RECREATED AFTER BEING REMOVED ***
[::]
[::]WARNING: Waited  secs for write IO to PST disk  in group 
[::]WARNING: Waited  secs for write IO to PST disk  in group 
[::]NOTE: Set to be offline flag for disk OCRVDISK_0000 only locally: flag 
[::]NOTE: Set to be offline flag for disk OCRVDISK_0001 only locally: flag 
[::]----- Abridged Call Stack Trace -----
[::]
[::]*** -- ::
[::]ksedsts()+<-kfdpGc_doTobeoflnAsync()+<-kfdpGc_checkTobeofln()+<-kfdpGc_timeout()+<-kfdp_timeoutBg()+<-ksbcti()+<-ksbabs()+<-ksbrdp()+<-opirip()+<-opidrv()+<-sou2o()+<-opimai_real()+<-ssthrdmain()+<-main()+<-__libc_start_main()+
[::] 
[::]----- End of Abridged Call Stack Trace -----
[::]GMON checking disk modes for group  at  for pid , osid 
[::]  dsk = /, mask = , op = clear
[::]  dsk = /, mask = , op = clear
[::]POST (justCheck) res =  
[::]=============== PST ==================== 
[::]grpNum:     
[::]state:      
[::]callCnt:    
[::](lockvalue) valid= ver= ndisks= flags= from inst= (I am ) last=
[::]--------------- HDR -------------------- 
[::]next:     
[::]last:     
[::]pst count:        
[::]pst locations:        
[::]incarn:           
[::]dta size:         
[::]version:          
[::]ASM version:      = 
[::]contenttype:     
[::]partnering pattern:      [ ]
[::]--------------- LOC MAP ---------------- 
[::]--------------- DTA -------------------- 
[::]--------------- HBEAT ------------------ 
[::]kfdpHbeat_dump: state=, inst=, ts=, 
[::]        rnd=.
[::]kfk io-queue:    
[::]kfdpHbeatCB_dump: at  with ts=// :: iop=, grp=, disk=/, isWrite= Hbeat #100 state=2 iostate=4
[::]kfdpHbeatCB_dump: at  with ts=// :: iop=, grp=, disk=/, isWrite= Hbeat #100 state=2 iostate=4
[::]kfdpHbeatCB_dump: at  with ts=// :: iop=, grp=, disk=/, isWrite= Hbeat #100 state=2 iostate=4
[::]GMON updating disk modes for group  at  for pid , osid 
[::]  dsk = /, mask = , op = clear
[::]  dsk = /, mask = , op = clear
[::]POST res =  
[::]=============== PST ==================== 
[::]grpNum:     
[::]state:      
[::]callCnt:    
[::](lockvalue) valid= ver= ndisks= flags= from inst= (I am ) last=
[::]--------------- HDR -------------------- 
[::]next:     
[::]last:     
[::]pst count:        
[::]pst locations:        
[::]incarn:           
[::]dta size:         
[::]version:          
[::]ASM version:      = 
[::]contenttype:     
[::]partnering pattern:      [ ]
[::]--------------- LOC MAP ---------------- 
[::]--------------- DTA -------------------- 
[::]--------------- HBEAT ------------------ 
[::]kfdpHbeat_dump: state=, inst=, ts=, 
[::]        rnd=.
[::]kfk io-queue:    
[::]kfdpHbeatCB_dump: at  with ts=// :: iop=, grp=, disk=/, isWrite= Hbeat #100 state=2 iostate=4
[::]kfdpHbeatCB_dump: at  with ts=// :: iop=, grp=, disk=/, isWrite= Hbeat #100 state=2 iostate=4
[::]kfdpHbeatCB_dump: at  with ts=// :: iop=, grp=, disk=/, isWrite= Hbeat #100 state=2 iostate=4
[::]NOTE: kfdp_updateInt: forceDismount grp 
[::]NOTE: GMON: failed to update modes: triggering force dismount of group 
[::]GMON dismounting group  at  for pid , osid 
[::]NOTE: kfdp_doDismount: dismount grp 
[::]
[::]*** -- ::
[::]NOTE: kfdpUtil_freeSlMsg: ksvtst for grp:  KSV status 
           

这是oracle11g 一个bug,以下是恢复操作:

step1.mount 问题ASM盘

su - grid
sqlplus / as sysasm
[::]SQL> alter diskgroup OCRVDISK mount;
[::]
[::]Diskgroup altered.
           

step2.恢复服务

cd /u01/app/11.2.0/grid/bin 
#./crsctl start crs
[::]CRS-: Oracle High Availability Services is already active
[::]CRS-: Command Start failed, or completed with errors.
           

使用该方法拉起集群

./crsctl start res ora.crsd -init
[::]CRS-: Attempting to start 'ora.crsd' on 'rac01'
[::]CRS-: Start of 'ora.crsd' on 'rac01' succeeded
           

检查

[::][root: /u01/app//grid/bin]#./crsctl check crs
[::]CRS-: Oracle High Availability Services is online
[::]CRS-: Cannot communicate with Cluster Ready Services
[::]CRS-: Cluster Synchronization Services is online
[::]CRS-: Event Manager is online
[::][root: /u01/app//grid/bin]#./crsctl check cluster -all
[::]**************************************************************
[::]rac01:
[::]CRS-: Cannot communicate with Cluster Ready Services
[::]CRS-: Cluster Synchronization Services is online
[::]CRS-: Event Manager is online
[::]**************************************************************
[::]rac02:
[::]CRS-: Cannot communicate with Cluster Ready Services
[::]CRS-: Cluster Synchronization Services is online
[::]CRS-: Event Manager is online
[::]**************************************************************
[::][root: /u01/app//grid/bin]#./crs_stat -t
[::]Name           Type           Target    State     Host        
[::]------------------------------------------------------------
[::]ora.BAK01.dg   ora....up.type ONLINE    ONLINE    rac01       
[::]ora.DATA01.dg  ora....up.type ONLINE    ONLINE    rac01       
[::]ora...._ARC.dg ora....up.type ONLINE    ONLINE    rac01       
[::]ora...._ARC.dg ora....up.type ONLINE    ONLINE    rac01       
[::]ora....ER.lsnr ora....er.type ONLINE    ONLINE    rac01       
[::]ora....N1.lsnr ora....er.type ONLINE    OFFLINE               
[::]ora....DISK.dg ora....up.type ONLINE    ONLINE    rac01       
[::]ora...._ARC.dg ora....up.type ONLINE    ONLINE    rac01       
[::]ora.asm        ora.asm.type   ONLINE    ONLINE    rac01       
[::]ora.cvu        ora.cvu.type   ONLINE    OFFLINE               
[::]ora.gbf1.db    ora....se.type ONLINE    ONLINE    rac01       
[::]ora.gbf2.db    ora....se.type ONLINE    ONLINE    rac01       
[::]ora.gsd        ora.gsd.type   OFFLINE   OFFLINE               
[::]ora....network ora....rk.type ONLINE    ONLINE    rac01       
[::]ora.oc4j       ora.oc4j.type  ONLINE    OFFLINE               
[::]ora.ons        ora.ons.type   ONLINE    ONLINE    rac01       
[::]ora....SM1.asm application    ONLINE    ONLINE    rac01       
[::]ora....lsnr application    ONLINE    ONLINE    rac01       
[::]ora.rac01.gsd  application    OFFLINE   OFFLINE               
[::]ora.rac01.ons  application    ONLINE    ONLINE    rac01       
[::]ora.rac01.vip  ora....t1.type ONLINE    ONLINE    rac01       
[::]ora.rac02.vip  ora....t1.type ONLINE    OFFLINE               
[::]ora....ry.acfs ora....fs.type ONLINE    ONLINE    rac01       
[::]ora.scan1.vip  ora....ip.type ONLINE    OFFLINE               
[::]ora.ywdb.db    ora....se.type ONLINE    ONLINE    rac01       
[::][root: /u01/app//grid/bin]#ckrac