Wednesday, September 5, 2012

ORA-16714 in a RAC environment

I had to setup a physical standby in a RAC environment. The primary database is a two node RAC cluster but the standby environment is a single instance database.
During setup and configuration of the dataguard broker, we had the following error . ” ORA-16608: one or more sites have warnings”.

In DGMGRL, the command line interface of the broker you check the configuration status using “show configuration”, the broker would give you the current setup at the end of the show configuration output, the broker tries to query the current status from all servers involved, this includes all instances in the RAC configuration on the primary and also the physical standby.

DGMGRL> show configuration;
Configuration - ssodg
  Protection Mode: MaxPerformance
  Databases:
    ssoprd - Primary database
      Warning: ORA-16792: configurable property value is inconsistent with database setting
    ssodrp - Physical standby database
Fast-Start Failover: DISABLED
Configuration Status:
WARNING


2. So the problem is for a init.ora parameter on one (or both) RAC nodes. We need to drill down, so “show database ssoprd” would tell us which database has this error and also show which property (or init.ora parameter) it is complaining about.


DGMGRL> show database ssoprd;

Database - ssoprd

  Role:            PRIMARY
  Intended State:  TRANSPORT-ON
  Instance(s):
    ssoprd1
    ssoprd2
      Warning: ORA-16714: the value of property StandbyFileManagement is inconsistent with the database setting
      Warning: ORA-16714: the value of property ArchiveLagTarget is inconsistent with the database setting
      Warning: ORA-16714: the value of property LogArchiveMaxProcesses is inconsistent with the database setting
      Warning: ORA-16714: the value of property LogArchiveMinSucceedDest is inconsistent with the database setting
      Warning: ORA-16714: the value of property LogArchiveTrace is inconsistent with the database setting
      Warning: ORA-16714: the value of property LogArchiveFormat is inconsistent with the database setting

Database Status:
WARNING


3. So just one node in the RAC cluster has a problem and that is ssoprd2. I logged into that node and did a “show parameter standby_file_management” and it showed a value of AUTO. Back in DGMGRL I did “show database verbose ssoprd”. The verbose option shows you DG configured properties for a db. The StandbyFileManagement property was also set to AUTO, so what is the problem. The command “SHOW DATABASE ssoprd 'InconsistentProperties';” gave me more info.
It showed me the 3 different places where a property/parameter can be set MEMORY_VALUE,   SPFILE_VALUE and BROKER_VALUE. As confirmed earlier the database MEMORY_VALUE had and standby_file_management =AUTO and the BROKER_VALUE had StandbyFileManagement=AUTO, the problem was that the SPFILE_VALUE parameter for node ssoprd2 had a value of NULL.

4. Now why would one node in a RAC cluster have a different SPFILE value when the SPFILE is kept in a shared location on ASM? I looked in the $ORACLE_HOME/dbs directory of node ssoprd2 and found a spfilessoprd2.ora file! Even though the contents was ‘*.SPFILE='+DG_DATA/ssoprd/spfilessoprd.ora' which is the shared location it is still a non-shared spfile in the local $ORACLE_HOME! Just to confirm I looked into $ORACLE_HOME/dbs on the other node did not find a SPFILE there. So the solution was simply to shutdown node 2, remove the spfile from $ORACLE_HOME/dbs and then to start the database on node 2. Note that each node had a initssoprd?.ora file in the $ORACLE_HOME with the contents *.SPFILE='+DG_DATA/ssoprd/spfilessoprd.ora'

5. This fixed the issue, it was the DGB not able to synchronize the value of these various parameters/properties between the various instances in the cluster. In a RAC environment its a best practice to maintain the SPFILE in a shared location such as ASM storage.
The SHOW CONFIGURATION command from the DGMGRL returned error free.

DGMGRL> show configuration;Configuration - ssodg  Protection Mode: MaxPerformance  Databases:    ssoprd - Primary database    ssodrp - Physical standby database Fast-Start Failover: DISABLED Configuration Status:SUCCESS DGMGRL> show database ssoprd;Database - ssoprd  Role:            PRIMARY  Intended State:  TRANSPORT-ON  Instance(s):    ssoprd1    ssoprd2 Database Status:SUCCESS

Thursday, January 26, 2012

Resolving Gaps in Data Guard Apply Using Incremental RMAN Backup

I found quite a few blog entries on the web and off course the Oracle Documentation also helped. The reason I want to create a post is that I encountered 2 problems that I first had to overcome before the Physical Standby was in sync again. A sort overview of the problem and the steps to overcome them are:

On the standby get the current scn which will be used in your RMAN backup script as a starting point. 

SQL>select current_scn FROM V$DATABASE;

     CURRENT_SCN
----------------
11133157117269


 On the primary run the RMAN script with the supplied current_scn number from the standby 

run {
allocate channel c1 type disk format '/dump/abcprd/%U.rmb';
backup as compressed backupset skip readonly incremental from scn 11133157117269 database;

And this is were I got my first unexpected problem. The RMAN backup gave the following warning message:

RMAN-06755: WARNING: datafile 408: incremental-start SCN is too recent; using checkpoint SCN 9733080640801 instead

File 408 is a Read Only Tablespace with a much older scn number. V$SESSION_LONGOPS showed that the backup will take longer than 24 hours to complete, so I immediately stopped it and reset the Read Only Tablespace's scn number with the aim that it will shorten the duration of the RMAN backup. 
SQL> alter tablespace SA_ORD_RO read write ;
SQL> alter tablespace SA_ORD_RO read only ;

I restarted the RMAN backup, it did not give the warning again, but still took 16 hours to complete. All files created by the backup were copied to the standby server. I also created a new standby controlfile:
RMAN> backup current controlfile for standby format '/usr/users/oracle/ForStandbyCTRL.bck';

On the standby server I cataloged and then restored the controlfile 
RMAN> catalog start with ' /dump/backup/';
RMAN> restore standby controlfile from '/dump/backup/ForStandbyCTRL.bck';


And the recover script:
RMAN> catalog start with ' /dump/abcprd/';
RMAN> recover database noredo;

But the Incremental backup failed almost immediately with the error

RMAN-06094: datafile 569 must be restored

It turns out that datafile 569 was created on the primary after the gap occured, but before the rman incremental was run. So the controlfile was aware of the datafile, but the file was not on the standby server. I also need to do a datafile backup for the newly created datafile. So back to Primary:
RMAN> backup datafile 569;
scp the file to the standby server and restore it on the standby server:
RMAN> restore datafile 569;

This time I could start the RMAN> recover database noredo; again and it completed successfully.


DB link from Oracle to SQL Server

We had a need for a DB link from an Oracle database to a SQL Server database. I initially feared that this might be a huge task, but it turned out to be quite a easy setup. Thanks Oracle for making it almost seamless!

There are basically 4 steps involved 

1.) Install & configure the Gateway

The Oracle Gateway can be found in the Oracle Database 11g Release 2 Enterprise Edition software. Installation is straight forward. You basically have to make 2 choices during installation. What Gateway you want to install (SQL Server in our case) and which ORACLE_HOME you want to use. You can create a separate Gateway home or just use the current ORACLE_HOME, which is what we did.

After installation you will find a new directory in ORACLE_HOME called dg4msql. Configure the Gateway by creating a agent init.ora file in $ORACLE_HOME/dg4msql/admin. The name of the file is important, as the name you use will be used in your listener as well. We kept it to the default initdg4msql.ora. The information you need to supply in the agent init.ora file are SQL Server Database Server Host Name, SQL Server Database Server Port number and SQL Server Database. The information all goes into one parameter called HS_FDS_CONNECT_INFO.

Ex:
# This is a customized agent init file that contains the HS parameters
# that are needed for the Database Gateway for Microsoft SQL Server

#
# HS init parameters
#
# HS_FDS_CONNECT_INFO=[server_name]:port//mssql_db
HS_FDS_CONNECT_INFO=[ABC02.company.co.za]:1433//MSCIM
# alternate connect format is hostname/serverinstance/databasename
HS_FDS_TRACE_LEVEL=OFF
HS_FDS_RECOVERY_ACCOUNT=RECOVER
HS_FDS_RECOVERY_PWD=RECOVER

2.) Listener & names lookup
We used port 1524 to setup the gateway

LISTENER =
  (ADDRESS_LIST =
    (ADDRESS = (PROTOCOL = tcp)(HOST = abc01.company.co.za)(PORT = 1521))
    (ADDRESS = (PROTOCOL = tcp)(HOST = localhost)(PORT = 1524))
  )

SID_LIST_LISTENER =
  (SID_LIST =
    (SID_DESC =
      (SID_NAME = dg4msql)
      (ORACLE_HOME = /opt/apps/oracle/product/11.2.0/dbhome_1)
      (PROGRAM = dg4msql)
    )
  )

And Local Naming

MSCIM =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = tcp)(HOST = localhost)(PORT = 1524))
    (CONNECT_DATA =
      (SID = dg4msql)
    )
    (HS = OK)
  )

HS=OK - indicates this connect descriptor is using heterogeneous Service.

3.) Database Link
Once the SQL Server team gave us a username and password we could just create a DB link

CREATE PUBLIC DATABASE LINK MSCIM_DBLINK CONNECT TO 'mssql_username' IDENTIFIED BY 'mssql_password' USING 'MSCIM'

4.) Data dictionary translation support
Oracle has also supplied a script in $ORACLE_HOME/ dg4msql/admin/ called dg4msql_cvw.sql that must be run on the SQL Server side. The script creates views on SQL Server for usage by the Oracle Data Dictionary. 4 views called ALL_IND_COLUMNS, USER_IND_COLUMNS, ALL_CONS_COLUMNS, and USER_CONS_COLUMNS will be created on the SQL Server db making it easier for someone familiar with Oracle Data Dictionary views to query the SQL Server database.

Friday, September 16, 2011

Background Managed Standby Recovery process not detected

We have had 2 instances where the dg broker did not start MAnaged Standby Recovery of the physical standby database. The first time was after a switchover and the next time was after hardware maintenance. The physical standby came up successfully and the dg broker started up, but Managed Standby Recovery never started. On both occasions I noticed these entries in the alert.log file:

...
Mon Sep 12 14:03:32 2011
Starting Data Guard Broker (DMON)
..
OCISessionBegin with PasswordVerifier succeeded
ALTER SYSTEM SET ..
ALTER SYSTEM SET ..
ALTER SYSTEM SET ..
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE  THROUGH ALL SWITCHOVER DISCONNECT  USING CURRENT LOGFILE
...
Mon Sep 12 14:08:06 2011
Background MRP initialization phase wait timeout   **
Background Managed Standby Recovery process not detected  **
Completed: ALTER DATABASE RECOVER MANAGED STANDBY DATABASE  THROUGH ALL SWITCHOVER DISCONNECT  USING CURRENT LOGFILE


I bounced the database a few times but the same error persisted. The only way I could solve this was to “kick start” the recovery by manually recovering a few archivelogs:

SQL>ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT FROM SESSION;
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT FROM SESSION
*
ERROR at line 1:
ORA-01153: an incompatible media recovery is active

But
SQL>ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;
only hangs

shutdown immediate also only hangs
shutdown abort did it.


SQL> startup mount

Start manual recovery before the DG Broker starts Managed Standby Recovery

SQL> RECOVER standby DATABASE  UNTIL CANCEL;

ORA-00279: change 10662691458828 generated at 09/13/2011 01:18:04 needed for thread 1
ORA-00289: suggestion : +PCASDGF/abcprd/archivelog/2011_09_13/thread_1_seq_82723.528.742958295
ORA-00280: change 10662691458828 for thread 1 is in sequence #82723


Specify log: {=suggested | filename | AUTO | CANCEL}
ORA-00279: change 10662691458829 generated at 09/13/2011 01:18:13 needed for thread 1
ORA-00289: suggestion : +PCASDGF/abcprd/archivelog/2011_09_13/thread_1_seq_82724.2112.742961677
ORA-00280: change 10662691458829 for thread 1 is in sequence #82724
ORA-00278: log file '+PCASDGF/abcprd/archivelog/2011_09_13/thread_1_seq_82723.528.742958295' no longer needed for this recovery


Specify log: {=suggested | filename | AUTO | CANCEL}

Hit enter, to manually recover a few logs and then

ORA-00279: change 10663204118583 generated at 09/13/2011 07:00:27 needed for thread 1
ORA-00289: suggestion : +PCASDGF/abcprd/archivelog/2011_09_13/thread_1_seq_82745.2365.742980297
ORA-00280: change 10663204118583 for thread 1 is in sequence #82745
ORA-00278: log file '+PCASDGF/abcprd/archivelog/2011_09_13/thread_1_seq_82744.2938.742978831' no longer needed for this recovery


Specify log: {=suggested | filename | AUTO | CANCEL}
CANCEL
Media recovery cancelled.

SQL>ALTER DATABASE RECOVER MANAGED STANDBY DATABASE THROUGH ALL SWITCHOVER DISCONNECT  USING CURRENT LOGFILE

Database altered.


Just to be sure I did another bounce a few minutes later and this time the DG Broker did everything it was supposed to and Managed Standby Recovery was started automatically.

Logon to Active Data Guard fails with ORA-16000

We have a reporting user used by executives for real time reporting on one of our OLTP databases. Some of the queries were getting resource intensive so we decided to move those queries to our Active Data Guard instance.

But then the reports started failing with
ORA-00604: error occurred at recursive SQL level 1
ORA-16000: database open for read-only access
on database logon. 

I found an old bug in MOS that pointed me in the right direction. After a failed login on Production, the same userid will get this error on the Active Data Guard instance, until the “logon counter” gets reset by a successful logon on production.

On Active Data Guard:
SQL >select * from v$version;

BANNER
------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - 64bit Production
...

SQL> select OPEN_MODE, DATABASE_ROLE from v$database;

OPEN_MODE  DATABASE_ROLE
---------- ----------------
READ ONLY  PHYSICAL STANDBY

SQL> conn infoweb/correctpwd
Connected.

On Production:

SQL >select * from v$version;

BANNER
------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - 64bit Production
...

SYS@casprd>select OPEN_MODE, DATABASE_ROLE from v$database;

OPEN_MODE  DATABASE_ROLE
---------- ----------------
READ WRITE PRIMARY

now login as infoweb with a wrong password on Production:

SQL>connect infoweb/wrongpwd
ERROR:
ORA-01017: invalid username/password; logon denied

Warning: You are no longer connected to ORACLE.

Back to Active Data Guard, logon as infoweb with the correct password:

SQL> conn infoweb/correctpwd
ERROR:
ORA-00604: error occurred at recursive SQL level 1
ORA-16000: database open for read-only access

Warning: You are no longer connected to ORACLE.

Back to Production, logon as infoweb with the correct password to reset the logon counter:

SQL> conn infoweb/correctpwd
Connected.

Back to Active Data Guard, logon as infoweb with the correct password:

SQL> conn infoweb/correctpwd
Connected.

This is similar to a 10.2.0.4 bug, Bug 5847453  Failed logon counter with read only DB throws ORA-16000 on failed logon [ID 5847453.8]. The note states that the issue was fixed in 10.2.0.5 and 11.1.0.6.

Tuesday, March 8, 2011

ACCOUNT_STATUS = EXPIRED(GRACE)

Users are identified within the app, so all DB connections are done through 1 schema. We created a profile for this schema where the password will never expire. App support has a manual process to change the password every 60 days, but Oracle must never be allowed to expire the password automatically.

The problem is that I found the ACCOUNT_STATUS for the main schema to be EXPIRED(GRACE) and that the EXPIRY_DATE is 3 days away.

This schemas profile was changed by accident, and rectified within a few hours. But the damage was done the ACCOUNT_STATUS still showed EXPIRED(GRACE).

I found an excellent blog post by Robert Geier (http://blog.contractoracle.com/2009/11/analysis-of-oracle-password-expiry.html) to explain the situation. Here’s a abbreviation of his explanation: password expiry is controlled by the PASSWORD_LIFE_TIME profile limit, is activated at login, and is reset by a password change. Changing the profile will not change aud$.ASTATUS.

SQL> select USERNAME, PROFILE, ACCOUNT_STATUS from dba_users where username = ‘QWERTY’;

USERNAME PROFILE ACCOUNT_STATUS

------------------ ------------------ --------------------------------

QWERTY QWERTY_NONEXPIRY EXPIRED(GRACE)

So even though the profile was changed back to QWERTY_NONEXPIRY the ACCOUNT_STATUS was still EXPIRED(GRACE). The only way to reset the ACCOUNT_STATUS back to OPEN is to change the password, because only a password change will reset aud$.ASTATUS. But a password change will give a ORA-28007 because of the current profile limits for PASSWORD_REUSE_MAX and PASSWORD_REUSE_TIME gets violated.

SQL> select password from sys.user$ where name='QWERTY';

PASSWORD

-----------------------------

87FE2DD49B2882B2

SQL> alter user QWERTY identified by values '87FE2DD49B2882B2';

alter user QWERTY identified by values '87FE2DD49B2882B2'

*

ERROR at line 1:

ORA-28007: the password cannot be reused

So I had to create a new temporary profile with the required settings to allow me to change the password to be the same as the current password:

SQL> create profile tmtmp limit PASSWORD_REUSE_MAX UNLIMITED PASSWORD_REUSE_TIME UNLIMITED;

SQL> alter user QWERTY profile tmtmp;

SQL> alter user QWERTY identified by values '87FE2DD49B2882B2';

User altered.

SQL> alter user QWERTY profile QWERTY_NONEXPIRY;

SQL> select USERNAME, PROFILE, ACCOUNT_STATUS from dba_users where username = 'QWERTY';

USERNAME PROFILE ACCOUNT_STATUS

------------------ ------------------ --------------------------------

QWERTY QWERTY_NONEXPIRY OPEN

Tuesday, June 22, 2010

Manually Add Datafile

I have a database that runs on RAW devices. Whenever I add a datafile to a tablespace, out UNIX Sys Admins must first create the RAW device on the Primary & Standby Server. The other day they created the RAW device on the standby server but forgot to change the ownership of the RAW device to oracle:dba.
The datafile was successfully added to the primary database, but I found the following error in the alert.log of the standby:

File #323 added to control file as 'UNNAMED00323'.
Originally created as:
'/dev/vx/rdsk/oradg1/casprd_SA_I_CMD10'
Recovery was unable to create the file as:
'/dev/vx/rdsk/oradg1_dg/casprd_SA_I_CMD10'
MRP0: Background Media Recovery terminated with error 1119
Mon Mar 9 23:41:44 2010
Errors in file /opt/apps/oracle/admin/caspp/bdump/caspp_mrp0_15261.trc:
ORA-01119: error in creating database file '/dev/vx/rdsk/oradg1_dg/casprd_SA_I_CMD10'
ORA-27040: file create error, unable to create file
HPUX-ia64 Error: 13: Permission denied
Managed Standby Recovery not using Real Time Apply
Some recovered datafiles maybe left media fuzzy
Media recovery may continue but open resetlogs may fail
Mon Mar 9 23:41:45 2010
Errors in file /opt/apps/oracle/admin/caspp/bdump/caspp_mrp0_15261.trc:
ORA-01119: error in creating database file '/dev/vx/rdsk/oradg1_dg/casprd_SA_I_CMD10'
ORA-27040: file create error, unable to create file
HPUX-ia64 Error: 13: Permission denied

I can only qquery some v$ views when a standby database is mounted. So to fix the problem I did the following:
On production get the relevant details

SQL>select ts# from v$tablespace where name = 'SA_I_CMD';
TS#
----------
17
SQL> col name for a80;
SQL> select FILE#, NAME, bytes/1024/1024 MB from v$datafile where ts#=17;
FILE# NAME MB
---------- ----------------------------------------------------- ----------
34 /dev/vx/rdsk/oradg/casprd_SA_I_CMD03 32500
35 /dev/vx/rdsk/oradg/casprd_SA_I_CMD 31144
36 /dev/vx/rdsk/oradg/casprd_SA_I_CMD02 30712
37 /dev/vx/rdsk/oradg/casprd_SA_I_CMD01 32360
38 /dev/vx/rdsk/oradg/casprd_SA_I_CMD04 32500
39 /dev/vx/rdsk/oradg/casprd_SA_I_CMD05 30000
201 /dev/vx/rdsk/oradg/casprd_SA_I_CMD06 20000
266 /dev/vx/rdsk/oradg1/casprd_SA_I_CMD07 32500
307 /dev/vx/rdsk/oradg1/casprd_SA_I_CMD08 32500
311 /dev/vx/rdsk/oradg1/casprd_SA_I_CMD09 32500
323 /dev/vx/rdsk/oradg1/casprd_SA_I_CMD10 32500

Lets look at the same query on the standby side

SQL> col name for a80
SQL> select FILE#, NAME, bytes/1024/1024 MB from v$datafile where ts#=17;
FILE# NAME MB
---------- ----------------------------------------------- ----------
34 /dev/vx/rdsk/oradg_dg/casprd_SA_I_CMD03 32500
35 /dev/vx/rdsk/oradg_dg/casprd_SA_I_CMD 31144
36 /dev/vx/rdsk/oradg_dg/casprd_SA_I_CMD02 30712
37 /dev/vx/rdsk/oradg_dg/casprd_SA_I_CMD01 32360
38 /dev/vx/rdsk/oradg_dg/casprd_SA_I_CMD04 32500
39 /dev/vx/rdsk/oradg_dg/casprd_SA_I_CMD05 30000
201 /dev/vx/rdsk/oradg_dg/casprd_SA_I_CMD06 20000
266 /dev/vx/rdsk/oradg1_dg/casprd_SA_I_CMD07 32500
307 /dev/vx/rdsk/oradg1_dg/casprd_SA_I_CMD08 32500
311 /dev/vx/rdsk/oradg1_dg/casprd_SA_I_CMD09 32500
323 /opt/apps/oracle/product/10.2.0.4.0/dbs/UNNAMED00323 0

After the UNIX admin corrected the file permissions of the RAW device I had to manually add it as a datafile to the standby database.

SQL>alter system set STANDBY_FILE_MANAGEMENT=MANUAL;
System altered.
SQL>ALTER DATABASE CREATE DATAFILE ‘/opt/apps/oracle/product/10.2.0.4.0/dbs/UNNAMED00323' as '/dev/vx/rdsk/oradg1_dg/casprd_SA_I_CMD10';
Database altered.
SQL>select FILE#, NAME, bytes/1024/1024 MB from v$datafile where ts#=17;
FILE# NAME MB
---------- ----------------------------------------------------- ----------
34 /dev/vx/rdsk/oradg_dg/casprd_SA_I_CMD03 32500
35 /dev/vx/rdsk/oradg_dg/casprd_SA_I_CMD 31144
36 /dev/vx/rdsk/oradg_dg/casprd_SA_I_CMD02 30712
37 /dev/vx/rdsk/oradg_dg/casprd_SA_I_CMD01 32360
38 /dev/vx/rdsk/oradg_dg/casprd_SA_I_CMD04 32500
39 /dev/vx/rdsk/oradg_dg/casprd_SA_I_CMD05 30000
201 /dev/vx/rdsk/oradg_dg/casprd_SA_I_CMD06 20000
266 /dev/vx/rdsk/oradg1_dg/casprd_SA_I_CMD07 32500
307 /dev/vx/rdsk/oradg1_dg/casprd_SA_I_CMD08 32500
311 /dev/vx/rdsk/oradg1_dg/casprd_SA_I_CMD09 32500
323 /dev/vx/rdsk/oradg1_dg/casprd_SA_I_CMD10 32500
11 rows selected.
SQL>alter system set STANDBY_FILE_MANAGEMENT=AUTO;
System altered.
SQL>ALTER DATABASE RECOVER MANAGED STANDBY DATABASE THROUGH ALL SWITCHOVER DISCONNECT USING CURRENT LOGFILE;
Database altered.