天天看点

Oracle DG 备库恢复--gap

数据库服务的过程中,经常有客户因备库服务器宕机导致备库无法同步数据的问题,向我们咨询数据恢复服务,最近遇到有3例。

如果客户主库的归档日志没有删除,那么备库启动后会自动同步最近的日志文件,保持与主库的同步;很明显,上面客户归档日志随着备份完成而清理,导致备库无法自动恢复。

下面模拟该场景的故障恢复。

(一)测试环境主备

# 主、备库版本
SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
PL/SQL Release 11.2.0.4.0 - Production
CORE    11.2.0.4.0      Production
TNS for Linux: Version 11.2.0.4.0 - Production
NLSRTL Version 11.2.0.4.0 - Production

#主库测试环境,产生较多归档日志
sys@ORCL> create table dgr tablespace tbs_baiyang as select * from all_objects;
sys@ORCL> alter system switch logfile;

#备库同步状态查询
SQL> select archived,max(sequence#),applied from v$archived_log group by archived,applied;
ARC MAX(SEQUENCE#) APPLIED
--- -------------- ---------
YES             81 IN-MEMORY
YES             80 YES

SQLL> select count(*) from dgr;

  COUNT(*)
----------
    84431

# 目前主备库同步正常           

(二)模拟故障环境

# 关闭备库
SQL> alter database recover managed standby database cancel;
SQL> shutdown immediate

# 主库执行大量更新操作,以产生大量归档日志
sys@ORCL> archive log list
Database log mode              Archive Mode
Automatic archival             Enabled
Archive destination            /u01/app/oracle/oradata/orcl/archivelog
Oldest online log sequence     83
Next log sequence to archive   85
Current log sequence           85

sys@ORCL> insert into dgr select * from dgr;

84431 rows created.

sys@ORCL> insert into dgr select * from dgr;

168862 rows created.

sys@ORCL> insert into dgr select * from dgr;

337724 rows created.

sys@ORCL> update dgr set object_id = 11;

675448 rows updated.

sys@ORCL> commit;

Commit complete.


sys@ORCL> archive log list;
Database log mode              Archive Mode
Automatic archival             Enabled
Archive destination            /u01/app/oracle/oradata/orcl/archivelog
Oldest online log sequence     93
Next log sequence to archive   95
Current log sequence           95

# 备份数据库
RMAN> backup database;

# 删除新生成归档日志文件
cd /u01/app/oracle/oradata/orcl/archivelog
$mkdir bak
$mv 1_9* bak/

# 打开备库
SQL>  startup mount

SQL> select current_scn,database_role,open_mode from v$database;

CURRENT_SCN DATABASE_ROLE    OPEN_MODE
----------- ---------------- --------------------
    3730534 PHYSICAL STANDBY MOUNTED

# 开启日志应用报错    
SQL> alter database recover managed standby database disconnect from session using current logfile;
alter database recover managed standby database disconnect from session using current logfile
*
ERROR at line 1:
ORA-01153: an incompatible media recovery is active

#查看下目前的日志应用状态,仍有新的归档日志传输到备库,但无法应用
SQL>  select archived,max(sequence#),applied from v$archived_log group by archived,applied;

ARC MAX(SEQUENCE#) APPLIED
--- -------------- ---------
YES             97 NO
YES             89 YES

# 查看下是否有gap产生,少了5个日志文件,正式刚才删除的
SQL> select * from v$archive_gap;

   THREAD# LOW_SEQUENCE# HIGH_SEQUENCE#
---------- ------------- --------------
         1            90             94
 # 以上就是模拟故障发生的步骤           

(三)恢复前的准备

以RMAN增量备份的方式恢复备库的过程中,需要重建备库的控制文件,势必需要了解当前备库的数据库文件、日志文件归属;如果主备文件目录保持一致,那么可以跳过这一步

# 日志目录
SQL> set linesize 300
SQL> col MEMBER for a60
SQL> select type,member from v$logfile;

TYPE    MEMBER
------- ------------------------------------------------------------
ONLINE  /u01/app/oracle/oradata/standby/redo03.log
ONLINE  /u01/app/oracle/oradata/standby/redo02.log
ONLINE  /u01/app/oracle/oradata/standby/redo01.log
STANDBY /u01/app/oracle/oradata/standby/standbylog/redo11.log
STANDBY /u01/app/oracle/oradata/standby/standbylog/redo12.log
STANDBY /u01/app/oracle/oradata/standby/standbylog/redo13.log
STANDBY /u01/app/oracle/oradata/standby/standbylog/redo14.log

# 数据文件目录
SQL> select name from v$datafile;

NAME
--------------------------------------------------------------------------------
/u01/app/oracle/oradata/standby/datafile/system01.dbf
/u01/app/oracle/oradata/standby/datafile/sysaux01.dbf
/u01/app/oracle/oradata/standby/datafile/undotbs01.dbf
/u01/app/oracle/oradata/standby/datafile/users01.dbf
/u01/app/oracle/oradata/standby/datafile/tbs_baiyang.dbf
/u01/app/oracle/oradata/standby/datafile/lxx2.dbf

# 查询当前备库的SCN
SQL> select current_scn,database_role,open_mode from v$database;

CURRENT_SCN DATABASE_ROLE    OPEN_MODE
----------- ---------------- --------------------
    3730534 PHYSICAL STANDBY MOUNTED

# 主库再进行一些更新操作
sys@ORCL> update dgr set object_id = 11;

675448 rows updated.

sys@ORCL> commit;

Commit complete.
           

(四)增量备份主库,及备份主库控制文件

# 增量备份
RMAN> run{
allocate channel c1 device type disk;
backup incremental from scn 3730534 database format '/oradata/ora_scn_%U.bak'; 
release channel c1;
}

RMAN> BACKUP CURRENT CONTROLFILE FOR STANDBY FORMAT '/oradata/ctl_20181213.bak';

# 将备份集传到备库
scp *.bak 172.16.8.123:/oradata/           

(五)恢复备库

# 首先使用增量备份恢复数据库
RMAN> catalog start with '/oradata/';
RMAN> recover database noredo;
……
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:07

Finished recover at 13-DEC-18

# 关闭并重新备库至nomount状态,此时控制文件需要恢复(为什么)
SQL> startup nomount

# 使用主库控制文件备份恢复备库控制文件
RMAN> RESTORE STANDBY CONTROLFILE FROM '/oradata/ctl_20181213.bak';
Starting restore at 13-DEC-18
using channel ORA_DISK_1

channel ORA_DISK_1: restoring control file
channel ORA_DISK_1: restore complete, elapsed time: 00:00:01
output file name=/u01/app/oracle/oradata/standby/control01.ctl
output file name=/u01/app/oracle/fast_recovery_area/standby/control02.ctl
Finished restore at 13-DEC-18

#启动数据库到mount状态
SQL> alter database mount;           

(六)控制文件恢复后,数据和日志文件路径要是和备库现有不同,可以使用在线命令rename数据文件,重建日志文件,主要是standby 日志

……
 ALTER DATABASE RENAME file '/u01/app/oracle/oradata/orcl/datafile/tbs_baiyang.dbf' to  '/u01/app/oracle/oradata/standby/datafile/tbs_baiyang.dbf';
……
SQL> alter database add standby logfile group 24 '/u01/app/oracle/oradata/standby/standby24.log' size 50m;
……
SQL> alter database drop standby logfile group 14;
……           

(七)这时打开数据库、打开日志实时应用进程

SQL> alter database open;
SQL> alter database recover managed standby database disconnect from session using current logfile;

SQL>  select archived,max(sequence#),applied from v$archived_log group by archived,applied;

ARC MAX(SEQUENCE#) APPLIED
--- -------------- ---------
YES            102 IN-MEMORY
YES            101 YES


SQL> select count(*) from dgr;

  COUNT(*)
----------
    675448
    
  
SQL> select count(*) from dgr where object_id=11 ;

  COUNT(*)
----------
    675448
               

为避免发生类似情况,(1)归档文件保留策略要尽可能的长,最好不要随着备份完成而删除(2)可以配置归档的备份策略,备份后可以删除归档,有需要从备份归档中还原