Friday, September 16, 2011

Background Managed Standby Recovery process not detected

We have had 2 instances where the dg broker did not start MAnaged Standby Recovery of the physical standby database. The first time was after a switchover and the next time was after hardware maintenance. The physical standby came up successfully and the dg broker started up, but Managed Standby Recovery never started. On both occasions I noticed these entries in the alert.log file:

...
Mon Sep 12 14:03:32 2011
Starting Data Guard Broker (DMON)
..
OCISessionBegin with PasswordVerifier succeeded
ALTER SYSTEM SET ..
ALTER SYSTEM SET ..
ALTER SYSTEM SET ..
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE  THROUGH ALL SWITCHOVER DISCONNECT  USING CURRENT LOGFILE
...
Mon Sep 12 14:08:06 2011
Background MRP initialization phase wait timeout   **
Background Managed Standby Recovery process not detected  **
Completed: ALTER DATABASE RECOVER MANAGED STANDBY DATABASE  THROUGH ALL SWITCHOVER DISCONNECT  USING CURRENT LOGFILE


I bounced the database a few times but the same error persisted. The only way I could solve this was to “kick start” the recovery by manually recovering a few archivelogs:

SQL>ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT FROM SESSION;
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT FROM SESSION
*
ERROR at line 1:
ORA-01153: an incompatible media recovery is active

But
SQL>ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;
only hangs

shutdown immediate also only hangs
shutdown abort did it.


SQL> startup mount

Start manual recovery before the DG Broker starts Managed Standby Recovery

SQL> RECOVER standby DATABASE  UNTIL CANCEL;

ORA-00279: change 10662691458828 generated at 09/13/2011 01:18:04 needed for thread 1
ORA-00289: suggestion : +PCASDGF/abcprd/archivelog/2011_09_13/thread_1_seq_82723.528.742958295
ORA-00280: change 10662691458828 for thread 1 is in sequence #82723


Specify log: {=suggested | filename | AUTO | CANCEL}
ORA-00279: change 10662691458829 generated at 09/13/2011 01:18:13 needed for thread 1
ORA-00289: suggestion : +PCASDGF/abcprd/archivelog/2011_09_13/thread_1_seq_82724.2112.742961677
ORA-00280: change 10662691458829 for thread 1 is in sequence #82724
ORA-00278: log file '+PCASDGF/abcprd/archivelog/2011_09_13/thread_1_seq_82723.528.742958295' no longer needed for this recovery


Specify log: {=suggested | filename | AUTO | CANCEL}

Hit enter, to manually recover a few logs and then

ORA-00279: change 10663204118583 generated at 09/13/2011 07:00:27 needed for thread 1
ORA-00289: suggestion : +PCASDGF/abcprd/archivelog/2011_09_13/thread_1_seq_82745.2365.742980297
ORA-00280: change 10663204118583 for thread 1 is in sequence #82745
ORA-00278: log file '+PCASDGF/abcprd/archivelog/2011_09_13/thread_1_seq_82744.2938.742978831' no longer needed for this recovery


Specify log: {=suggested | filename | AUTO | CANCEL}
CANCEL
Media recovery cancelled.

SQL>ALTER DATABASE RECOVER MANAGED STANDBY DATABASE THROUGH ALL SWITCHOVER DISCONNECT  USING CURRENT LOGFILE

Database altered.


Just to be sure I did another bounce a few minutes later and this time the DG Broker did everything it was supposed to and Managed Standby Recovery was started automatically.

Logon to Active Data Guard fails with ORA-16000

We have a reporting user used by executives for real time reporting on one of our OLTP databases. Some of the queries were getting resource intensive so we decided to move those queries to our Active Data Guard instance.

But then the reports started failing with
ORA-00604: error occurred at recursive SQL level 1
ORA-16000: database open for read-only access
on database logon. 

I found an old bug in MOS that pointed me in the right direction. After a failed login on Production, the same userid will get this error on the Active Data Guard instance, until the “logon counter” gets reset by a successful logon on production.

On Active Data Guard:
SQL >select * from v$version;

BANNER
------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - 64bit Production
...

SQL> select OPEN_MODE, DATABASE_ROLE from v$database;

OPEN_MODE  DATABASE_ROLE
---------- ----------------
READ ONLY  PHYSICAL STANDBY

SQL> conn infoweb/correctpwd
Connected.

On Production:

SQL >select * from v$version;

BANNER
------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - 64bit Production
...

SYS@casprd>select OPEN_MODE, DATABASE_ROLE from v$database;

OPEN_MODE  DATABASE_ROLE
---------- ----------------
READ WRITE PRIMARY

now login as infoweb with a wrong password on Production:

SQL>connect infoweb/wrongpwd
ERROR:
ORA-01017: invalid username/password; logon denied

Warning: You are no longer connected to ORACLE.

Back to Active Data Guard, logon as infoweb with the correct password:

SQL> conn infoweb/correctpwd
ERROR:
ORA-00604: error occurred at recursive SQL level 1
ORA-16000: database open for read-only access

Warning: You are no longer connected to ORACLE.

Back to Production, logon as infoweb with the correct password to reset the logon counter:

SQL> conn infoweb/correctpwd
Connected.

Back to Active Data Guard, logon as infoweb with the correct password:

SQL> conn infoweb/correctpwd
Connected.

This is similar to a 10.2.0.4 bug, Bug 5847453  Failed logon counter with read only DB throws ORA-16000 on failed logon [ID 5847453.8]. The note states that the issue was fixed in 10.2.0.5 and 11.1.0.6.