Thinus Meyer's DBA adventures: Data Guard Failover surprise

Ran into a interesting scenario after reinstating the previous primary DB after an emergency failover.

A Data Guard failover is done during emergencies (an actual DR situation) and the old standby gets activated as a Primary database with RESETLOGS after an incomplete recovery.

Here's the alert.log from the new Primary database:

Tue Oct 02 16:07:34 2018
Data Guard Broker: A failover will now be attempted
Tue Oct 02 16:07:34 2018
ALTER DATABASE FAILOVER TO TMSBY
...
Tue Oct 02 16:07:40 2018
ALTER DATABASE SWITCHOVER TO PRIMARY (TMSBY1)
RESETLOGS after incomplete recovery UNTIL CHANGE 14297961467966 time 10/02/2018 16:02:10
...
Tue Oct 02 16:07:40 2018
Setting recovery target incarnation to 3
Tue Oct 02 16:07:41 2018
Switchover: Complete - Database mounted as primary

A few hours later the old Primary DB (TMPRD) was fixed, started in MOUNT mode and reinstated via the Broker:

DGMGRL> show configuration
Configuration - tmdg
Protection Mode: MaxPerformance
Members:
TMSBY - Primary database
TMPRD - Physical standby database (disabled)
ORA-16661: the standby database needs to be reinstated
Fast-Start Failover: DISABLED
Configuration Status:
SUCCESS (status updated 26 seconds ago)
DGMGRL> reinstate database TMPRD
Reinstating database "TMPRD", please wait...
Reinstatement of database "TMPRD" succeeded

Reinstatement completed successfully, but the new standby (TMPRD) was not applying logs, the alert.log confirmed that it was waiting for an Archivelog that was only created on the previous standby site.

Media Recovery Waiting for thread 2 sequence 358 branch(resetlogs_id) 987409798

But that archivelog has already been backed up and deleted via RMAN. A straight forward restore on TMSBY should work but then it failed with:

RMAN> restore archivelog sequence 358 thread 2;
Starting restore at 03-OCT-18 08:40
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=232 instance=TMSBY1 device type=DISK
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of restore command at 10/03/2018 08:40:15
RMAN-20242: specification does not match any archived log in the repository

Remember the Failover forced a RESETLOGS so the incarnation of the DB changed, so the trick is:

RMAN> list incarnation;
List of Database Incarnations
DB Key Inc Key DB Name DB ID STATUS Reset SCN Reset Time
------- ------- -------- ---------------- --- ---------- ----------
1 1 TMPRD 645377540 PARENT 1594143 31-AUG-18 10:37
2 2 TMPRD 645377540 PARENT 5368114 21-SEP-18 08:29
3 3 TMPRD 645377540 CURRENT 14297961467967 02-OCT-18 16:07

RMAN> restore archivelog sequence 358 thread 2 Incarnation 2;

Starting restore at 03-OCT-18 08:44
using channel ORA_DISK_1
channel ORA_DISK_1: starting archived log restore to default destination
channel ORA_DISK_1: restoring archived log
archived log thread=2 sequence=358
channel ORA_DISK_1: reading from backup piece +DG_FRA/TMSBY/BACKUPSET/2018_10_02/annnf0_TMSBY1_arch_0.337.988488123
channel ORA_DISK_1: piece handle=+DG_FRA/TMSBY/BACKUPSET/2018_10_02/annnf0_TMSBY1_arch_0.337.988488123 tag=TMSBY1_ARCH
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:15
Finished restore at 03-OCT-18 08:45

And within a few minutes the restored Archivelog gets picked up by the FAL process, transfered and applied on the new standby (TMPRD). Alert.log contents:

RFS[3]: Opened log for thread 2 sequence 358 dbid 645377540 branch 987409798
...
Archived Log entry 1627 added for thread 2 sequence 358 rlc 987409798 ID 0x0 dest 2:
...
Media Recovery Log +DG_ARCHIVE/TMPRD/ARCHIVELOG/2018_10_03/thread_2_seq_358.49304.988533935

All in all I was impressed with the smooth failover and reinstatement via the broker, but even then, not everything can get automated and there's always some hicup where the DBA should roll up their sleeves and dig in in order to get things back to normal again.

Thinus Meyer's DBA adventures

Wednesday, October 3, 2018

Data Guard Failover surprise

No comments:

Blog Archive