Tuesday, October 30, 2018

OS Logfiles via SQL

Due to security restrictions it has become cumbersome to login to a DB Server or to even login as SYS to the DB.
I've had to quickly adapt and found the following views and queries beneficial when querying log files that I normally accessed from the OS:

To view the rman log for the last rman session:

select output
from v$rman_output
where session_recid = (select max(session_recid) from v$rman_status)
order by recid ;

The v$rman_output view is memory-based so it will not persist thru a database restart, the view provides the exact output from rman as it would appear on the screen or a logfile.

To query the alert.log:

select * from
(select record_id,
        to_char(originating_timestamp,'DD.MM.YYYY HH24:MI:SS'),
        message_text
 from x$dbgalertext
 order by RECORD_ID desc)
where rownum <= 20 order by RECORD_ID asc;

but will fail with "ORA-00942: table or view does not exist" if you are not logged in as SYS, and you can't even grant select access to the X$ table to another user, the grant will fail with "ORA-02030: can only select from fixed tables/views". My workaround was to create a view and then grant a select privilege on the view:

create view alert_log as select * from  X$DBGALERTEXT;
grant select on alert_log to meyerm;

Then as user meyerm:

select * from
(select record_id,
        to_char(originating_timestamp,'DD.MM.YYYY HH24:MI:SS'),
        message_text
 from sys.alert_log
 order by RECORD_ID desc)
where rownum <= 20 order by RECORD_ID asc;

Monday, October 8, 2018

Move datafile to another ASM diskgroup

From Oracle 12.1 you can execute SQL commands from within RMAN, which was great for this exercise so I did not have to jump between RMAN and SQLPlus every few minutes.

The steps to move a datafile to another ASM diskgroup is basically to take the datafile offline, copy the datafile to the new diskgroup (using RMAN commands), rename the datafile (or switch to copy) and bring the datafile online again.

RMAN> alter database datafile '+DG_DATA/ac3duim/datafile/ggatetbs.362.905353545' offline;

RMAN> copy datafile '+DG_DATA/ac3duim/datafile/ggatetbs.362.905353545' to '+DG_C2DATA';

...
output file name=+DG_C2DATA/ac3duim/datafile/ggatetbs.275.923218515 tag=TAG20160922T093514 RECID=1 STAMP=923218516
...

RMAN> alter database rename file '+DG_DATA/ac3duim/datafile/ggatetbs.362.905353545' to '+DG_C2DATA/ac3duim/datafile/ggatetbs.275.923218515';
OR
RMAN> switch datafile '+DG_DATA/ac3duim/datafile/ggatetbs.362.905353545' to copy;

RMAN> recover datafile '+DG_C2DATA/ac3duim/datafile/ggatetbs.275.923218515';

RMAN> alter database datafile '+DG_C2DATA/ac3duim/datafile/ggatetbs.275.923218515' online;

The old datafile can now be deleted (also via RMAN)


RMAN> delete datafilecopy '+DG_DATA/ac3duim/datafile/ggatetbs.362.905353545';

Wednesday, October 3, 2018

Data Guard Failover surprise

Ran into a interesting scenario after reinstating the previous primary DB after an emergency failover.

A Data Guard failover is done during emergencies (an actual DR situation) and the old standby gets activated as a Primary database with RESETLOGS after an incomplete recovery. 

Here's the alert.log from the new Primary database:

Tue Oct 02 16:07:34 2018
 Data Guard Broker: A failover will now be attempted
Tue Oct 02 16:07:34 2018
ALTER DATABASE FAILOVER TO TMSBY
...
Tue Oct 02 16:07:40 2018
ALTER DATABASE SWITCHOVER TO PRIMARY (TMSBY1)
RESETLOGS after incomplete recovery UNTIL CHANGE 14297961467966 time 10/02/2018 16:02:10
...
Tue Oct 02 16:07:40 2018
Setting recovery target incarnation to 3
Tue Oct 02 16:07:41 2018
Switchover: Complete - Database mounted as primary

A few hours later the old Primary DB (TMPRD) was fixed, started in MOUNT mode and reinstated via the Broker:

DGMGRL> show configuration
Configuration - tmdg
  Protection Mode: MaxPerformance
  Members:
  TMSBY  - Primary database
    TMPRD - Physical standby database (disabled)
      ORA-16661: the standby database needs to be reinstated
Fast-Start Failover: DISABLED
Configuration Status:
SUCCESS   (status updated 26 seconds ago)
DGMGRL> reinstate database TMPRD
Reinstating database "TMPRD", please wait...
Reinstatement of database "TMPRD" succeeded

Reinstatement completed successfully, but the new standby (TMPRD) was not applying logs, the alert.log confirmed that it was waiting for an Archivelog that was only created on the previous standby site. 

Media Recovery Waiting for thread 2 sequence 358 branch(resetlogs_id) 987409798

But that archivelog has already been backed up and deleted via RMAN. A straight forward restore on TMSBY should work but then it failed with: 

RMAN> restore archivelog sequence 358 thread 2;
Starting restore at 03-OCT-18 08:40
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=232 instance=TMSBY1 device type=DISK
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of restore command at 10/03/2018 08:40:15
RMAN-20242: specification does not match any archived log in the repository

Remember the Failover forced a RESETLOGS so the incarnation of the DB changed, so the trick is:

RMAN> list incarnation;
List of Database Incarnations
DB Key  Inc Key DB Name  DB ID            STATUS  Reset SCN  Reset Time
------- ------- -------- ---------------- --- ---------- ----------
1       1       TMPRD   645377540        PARENT  1594143    31-AUG-18 10:37
2       2       TMPRD   645377540        PARENT  5368114    21-SEP-18 08:29
3       3       TMPRD   645377540        CURRENT 14297961467967 02-OCT-18 16:07

RMAN> restore archivelog sequence 358 thread 2 Incarnation 2;

Starting restore at 03-OCT-18 08:44
using channel ORA_DISK_1
channel ORA_DISK_1: starting archived log restore to default destination
channel ORA_DISK_1: restoring archived log
archived log thread=2 sequence=358
channel ORA_DISK_1: reading from backup piece +DG_FRA/TMSBY/BACKUPSET/2018_10_02/annnf0_TMSBY1_arch_0.337.988488123
channel ORA_DISK_1: piece handle=+DG_FRA/TMSBY/BACKUPSET/2018_10_02/annnf0_TMSBY1_arch_0.337.988488123 tag=TMSBY1_ARCH
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:15
Finished restore at 03-OCT-18 08:45

And within a few minutes the restored Archivelog gets picked up by the FAL process, transfered and applied on the new standby (TMPRD). Alert.log contents:

RFS[3]: Opened log for thread 2 sequence 358 dbid 645377540 branch 987409798
...
Archived Log entry 1627 added for thread 2 sequence 358 rlc 987409798 ID 0x0 dest 2:
...
Media Recovery Log +DG_ARCHIVE/TMPRD/ARCHIVELOG/2018_10_03/thread_2_seq_358.49304.988533935

All in all I was impressed with the smooth failover and reinstatement via the broker, but even then, not everything can get automated and there's always some hicup where the DBA should roll up their sleeves and dig in in order to get things back to normal again.