天天看点

配置dg broker的问题分析及修复

最近从同事那儿接手了一套新环境,备库因为服务器问题已经下架,重新配了一台服务器,所以需要搭一套备库,主库已经配置好了,而且同事已经把在主库把dg broker配好了。

使用dgmgrl来验证,只有主库孤零零的在那儿。

DGMGRL> show configuration;

Configuration - dg_test

  Protection Mode: MaxPerformance

  Databases:

    stest  - Primary database

Fast-Start Failover: DISABLED

Configuration Status:

SUCCESS

备库搭建的过程也还算顺利,中间也经历了一些小插曲,不过问题最后都得到解决。duplicate同步之后,开始启用dg broker,因为主库的dg broker配置已经有了,所以直接添加备库。

数据库db_name为test,主库的unique_name为stest,备库为s2test

DGMGRL> add database s2test as

  connect identifier is s2test

  maintained as physical;

想必添加完之后就开始enable database就大功告成,没想到还是有问题。

DGMGRL> show database verbose s2test

Database - s2test

Database Status:

。。。

DGM-17016: failed to retrieve status for database "s2tlest"

ORA-16664: unable to receive the result from a database

这个时候借助oerr来看看错误信息

$ oerr ora 16664

16664, 0000, "unable to receive the result from a database"

// *Cause:  During execution of a command, a database in the Data Guard

//          broker configuration failed to return a result.

// *Action: Check Data Guard broker logs for the details of the failure.

//          Ensure network communication is working properly amongst the

//          members of the configuration. Fix any possible network problems

//          and reissue the command.

dg broker的错误解释

$ oerr dgm 17016

17016, 00000, "failed to retrieve status for database \"%s\""

// *Cause:  DGMGRL could not retrieve the StatusReport property from the given

//          database and thus could not report the database status.

// *Action: See accompanying messages for details.

看来没有得到更多的信息,就从备库的dg broker日志中查看,

######dataguard log

Warning: Property 'LogArchiveFormat' has inconsistent values:METADATA='%t_%s_%r.dbf', SPFILE='(missing)', DATABASE='%t_%s_%r.dbf'

Failed to send message to site stest. Error code is ORA-16501.

10/10/2015 14:04:36

Warning: Property 'ArchiveLagTarget' has inconsistent values:METADATA='0', SPFILE='', DATABASE='0'

Warning: Property 'LogArchiveMaxProcesses' has inconsistent values:METADATA='4', SPFILE='', DATABASE='4'

Warning: Property 'LogArchiveMinSucceedDest' has inconsistent values:METADATA='1', SPFILE='', DATABASE='1'

SPFILE is missing value for property 'LogArchiveTrace' with sid='test'

Warning: Property 'LogArchiveTrace' has inconsistent values:METADATA='0', SPFILE='(missing)', DATABASE='0'

SPFILE is missing value for property 'LogArchiveFormat' with sid='test'

从日志来看备库是连接主库有问题,

$ oerr  ora 16501

16501, 00000, "the Data Guard broker operation failed"

// *Cause:  The Data Guard broker operation failed.

如此来看主库连接备库有问题,备库说连接主库有问题,查看了好几遍网络配置,都没有发现任何问题。

在主库中使用dgmgrl来查看更多的细节信息。

DGMGRL> show database verbose stest;

Database - stest

  Role:            PRIMARY

  Intended State:  TRANSPORT-ON

  Instance(s):

    test

  Properties:

    DGConnectIdentifier             = 'test01'

    ObserverConnectIdentifier       = ''

    LogXptMode                      = 'ASYNC'

    DelayMins                       = '0'

    Binding                         = 'optional'

    MaxFailure                      = '0'

    MaxConnections                  = '1'

    ReopenSecs                      = '300'

    NetTimeout                      = '30'

    RedoCompression                 = 'DISABLE'

    LogShipping                     = 'ON'

    PreferredApplyInstance          = ''

    ApplyInstanceTimeout            = '0'

    ApplyParallel                   = 'AUTO'

    StandbyFileManagement           = 'AUTO'

    ArchiveLagTarget                = '0'

    LogArchiveMaxProcesses          = '2'

    LogArchiveMinSucceedDest        = '1'

    DbFileNameConvert               = '/U01/app/oracle/oradata/test, /U01/app/oracle/oradata/test'

    LogFileNameConvert              = '/U01/app/oracle/oradata/test, /U01/app/oracle/oradata/test'

    FastStartFailoverTarget         = ''

    InconsistentProperties          = '(monitor)'

    InconsistentLogXptProps         = '(monitor)'

    SendQEntries                    = '(monitor)'

    LogXptStatus                    = '(monitor)'

    RecvQEntries                    = '(monitor)'

    SidName                         = 'test'

    StaticConnectIdentifier         = '(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=test01.cyou.com)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=stest_DGMGRL)(INSTANCE_NAME=test)(SERVER=DEDICATED)))'

    StandbyArchiveLocation          = '/U01/app/oracle/arch'

    AlternateLocation               = ''

    LogArchiveTrace                 = '0'

    LogArchiveFormat                = '%t_%s_%r.dbf'

    TopWaitEvents                   = '(monitor)'

查看备库的

  Role:            PHYSICAL STANDBY

  Intended State:  APPLY-ON

  Transport Lag:   (unknown)

  Apply Lag:       (unknown)

  Real Time Query: OFF

    DGConnectIdentifier             = 's2test'

    LogArchiveMaxProcesses          = '4'

    DbFileNameConvert               = ''

    LogFileNameConvert              = ''

    StaticConnectIdentifier         = '(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=stest.cyou.com)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=s2test_DGMGRL)(INSTANCE_NAME=test)(SERVER=DEDICATED)))'

    StandbyArchiveLocation          = '/U01/app/oracle/product/11.2.3/db_1/dbs/arch'

DGM-17016: failed to retrieve status for database "s2test"

最后查看mos,发现一篇文章比较贴近,不过描述的问题原因还是有些差别。

Data Guard Standby Database Broker Configuration error DGM-17016: failed to retrieve status for database (文档 ID 1631552.1)

进一步排除,排除了防火墙的影响,甚至考虑重启一下数据库,结果重启库的时候还确实有问题

SQL> startup

ORACLE instance started.

Total System Global Area 1.5734E+10 bytes

Fixed Size                  2243832 bytes

Variable Size            1375732488 bytes

Database Buffers         1.4328E+10 bytes

Redo Buffers               27852800 bytes

Database mounted.

ORA-04031: unable to allocate 32 bytes of shared memory ("shared pool","ALTER

DATABASE OPEN","SQLA","tmp")

不过这个和dg broker的问题是没有关系的,是sga配置的shared pool太小,process配置了3000,结果就导致启动的时候shared pool过小起不来了。

备库数据库启动了,但是dg broker的验证还是照样有问题,这个时候就仔细比对show database verbose中的信息

发现主库的信息有一点特别。

特别就特别在这个地方,这个地方最好应该是和db_unique_name一致,结果设置成了test01,备库中网络服务中也没有配置这个服务名

简单修改一下,修改为db_unique_name

DGMGRL> edit database stest set property DGConnectIdentifier ='stest';

Property "dgconnectidentifier" updated

再次查看就没有问题了。

    s2test - Physical standby database

可能之前的同事都是习惯使用db_name来作为主库的dg配置,结果自己还纠结了半天,排除了不少的因素,最后发现竟然是这么一个细小的地方,修改为db_unique_name就可以了,可见对于这些小的细节上最好还是能够有一个统一的标准规范,这样也好规范大家,避免这类问题带来的困扰。