Thinus Meyer's DBA adventures: 2008

Tuesday, December 23, 2008

Problem setting up Archive Log file management on ASM

I created a ASM diskgroup for FRA and set the relevant parameters. The aim to to write Archive logs, Flashback logs and RMAN backups to this FRA.


SQL>select FREE_MB from v$asm_diskgroup where NAME='PRNADGF';

 FREE_MB
----------
 2,761,992

SQL>show parameter db_recovery_file

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
db_recovery_file_dest                string
db_recovery_file_dest_size           big integer 0

SQL>alter system set db_recovery_file_dest_size=2700000M;

System altered.

SQL>alter system set db_recovery_file_dest='+PRNADGF';

System altered.

My problems started when I changed to Archive Log Mode


SQL>archive log list
Database log mode              No Archive Mode
Automatic archival             Disabled
Archive destination            USE_DB_RECOVERY_FILE_DEST
Oldest online log sequence     10
Current log sequence           12

SQL>shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.

SQL>startup mount
ORACLE instance started.

Total System Global Area 3156803584 bytes
Fixed Size                  2110872 bytes
Variable Size             402655848 bytes
Database Buffers         2734686208 bytes
Redo Buffers               17350656 bytes
Database mounted.
SQL>alter database archivelog;

Database altered.

SQL>alter database open;

Database altered.

SQL>archive log list
Database log mode              Archive Mode
Automatic archival             Enabled
Archive destination            USE_DB_RECOVERY_FILE_DEST
Oldest online log sequence     10
Next log sequence to archive   12
Current log sequence           12

So the archive log files should be written to my FRA on ASM, but when I tested it I got a unexpected error:


SQL>alter system archive log current;

alter system archive log current
*
ERROR at line 1:
ORA-16038: log 3 sequence# 12 cannot be archived
ORA-19502: write error on file "", block number  (block size=)
ORA-00312: online log 3 thread 1: '+PRNADG1/rnatst/onlinelog/group_3.278.672669129'

This error message is misleading because there is nothing wrong with the FRA on ASM. I found the real error message in the alert.log file:


ARCH: Encountered disk I/O error 19502
ARCH: Closing local archive destination LOG_ARCHIVE_DEST_1: '/opt/apps/oracle/product/database/11.1.0.6.0/dbs/arch1_12_672668870.dbf' (error 19502)
(rnatst)
Errors in file /opt/apps/oracle/diag/rdbms/rnatst/rnatst/trace/rnatst_ora_21990.trc:

So the archiver also wants to archive to my $ORACLE_HOME/dbs directory, it cannot do that because there is not enough space on that filesystem, but the real question is why does it want to write to that destination?
When I do a show parameter I don't get the answer:


SQL> show parameter log_archive_dest

log_archive_dest                     string
log_archive_dest_1                   string
log_archive_dest_10                  string
log_archive_dest_2                   string
log_archive_dest_3                   string
log_archive_dest_4                   string
log_archive_dest_5                   string
log_archive_dest_6                   string
log_archive_dest_7                   string
log_archive_dest_8                   string
log_archive_dest_9                   string
log_archive_dest_state_1             string      enable
log_archive_dest_state_10            string      enable
log_archive_dest_state_2             string      enable
log_archive_dest_state_3             string      enable
log_archive_dest_state_4             string      enable
log_archive_dest_state_5             string      enable
log_archive_dest_state_6             string      enable
log_archive_dest_state_7             string      enable
log_archive_dest_state_8             string      enable
log_archive_dest_state_9             string      enable
log_archive_duplex_dest              string
log_archive_min_succeed_dest         integer     1

Again I found the answer in the alert.log


Starting ORACLE instance (normal)
Using LOG_ARCHIVE_DEST_1 parameter default value as /opt/apps/oracle/product/database/11.1.0.6.0/dbs/arch
Using LOG_ARCHIVE_DEST_10 parameter default value as USE_DB_RECOVERY_FILE_DEST

It seems to be a DEFAULT destination that does not show up when I use ARCHIVE LOG LIST or SHOW PARAMETER LOG_ARCHIVE_DEST.
It does show up in V$ARCHIVE_DEST though:


SQL> select dest_id, destination, status from V$ARCHIVE_DEST;


 DEST_ID DESTINATION                                             STATUS
---------- ------------------------------------------------------- ---------
       1 /opt/apps/oracle/product/database/11.1.0.6.0/dbs/arch   VALID
       2                                                         INACTIVE
       3                                                         INACTIVE
       4                                                         INACTIVE
       5                                                         INACTIVE
       6                                                         INACTIVE
       7                                                         INACTIVE
       8                                                         INACTIVE
       9                                                         INACTIVE
      10 USE_DB_RECOVERY_FILE_DEST                               VALID

To solve the problem:


SQL> alter system set log_archive_dest_state_1=DEFER;

And to test


SQL>alter system archive log current;

System altered.

SQL>select dest_id, name from v$archived_log;

 DEST_ID NAME
---------- ----------------------------------------------------------------------
      10 +PRNADGF/rnatst/archivelog/2008_12_22/thread_1_seq_12.256.674133347

SQL> select dest_id, destination, status from V$ARCHIVE_DEST;


 DEST_ID DESTINATION                                             STATUS
---------- ------------------------------------------------------- ---------
       1 /opt/apps/oracle/product/database/11.1.0.6.0/dbs/arch   DISABLED
       2                                                         INACTIVE
       3                                                         INACTIVE
       4                                                         INACTIVE
       5                                                         INACTIVE
       6                                                         INACTIVE
       7                                                         INACTIVE
       8                                                         INACTIVE
       9                                                         INACTIVE
      10 USE_DB_RECOVERY_FILE_DEST                               VALID

Lesson learned.
Keep on looking until you can explain the error.

Friday, December 12, 2008

Ora-17505 using RMAN with ASM

I had to create a Data Guard instance on test, so that we can evaluate Transparent Application Failover (TAF) for the app. I wanted to test the same procedure to build Data Guard that I will use for Production, and that meant that I had to do a RMAN backup to ASM. Our Storage team loaned me LUNs of different sizes for my temporary ASM on test.

But my backup command (Backup filesperset 10 database include current controlfile for standby) failed with:

RMAN-03009: failure of backup command on ORA_DISK_13 channel at 10/30/2008 20:21:27
ORA-19510: failed to set size of 5779210 blocks for file "+PCASDGF" (blocksize=8192)
ORA-17505: ksfdrsz:1 Failed to resize file to size 5779210 blocks

This was a surprise because the test db is 7TB and FRA is 9TB.


SQL> SELECT sum(space), sum(bytes) FROM v$asm_file;
and
SQL> SELECT * FROM V$FLASH_RECOVERY_AREA_USAGE;

confirmed that I had more than enough space available. In fact the backup failed after using only 47% fo the available space.

It turns out that the smallest disk in the diskgroup was the bottleneck.


SQL>select group_number, disk_number, total_mb, free_mb from v$asm_disk order by 4;

GROUP_NUMBER DISK_NUMBER TOTAL_MB FREE_MB
----------------------------------------
1         13       86315          70
1         16       17263       17080
1         14       17263       17080
1        129       34522       34168
1        130       34522       34168
1        131       34522       34168
1         19       34522       69052
...

As you can see disk 13 only had 70MB of space available. I removed all the disks of varying sizes and only kept the disks of 69052 MB Size. The total size of the FRA came down to 8493396 MB, but the RMAN backup completed successfully.

Lesson Learned:
ASM spreads file extents evenly accross all the disks disks on a diskgroup. An ORA-17505 error can still be encountered due to imbalanced free space between disks. The reason for this is that one disk lacking sufficient free space makes it impossible to do any allocation in a disk group because every file must be evenly allocated across all disks.

Remove ASM disk

One of the big selling points of ASM is the ability to reconfigure the storage online. I had to remove 10 disks from a +ASM test system that had to be redeployed on another server. The steps seemed easy enough until I ran into a problem;


SQL> select d.MOUNT_STATUS, d.MODE_STATUS, d.STATE, d.NAME, d.PATH
from v$asm_disk d, v$asm_diskgroup dg
where d.GROUP_NUMBER=dg.GROUP_NUMBER
and dg.name = 'PCASDGF'
order by 4
/

MOUNT_S MODE_ST STATE NAME PATH HEADER_STATUS
------- ------- -------- -------------------- --------------------------- -------------
...
CACHED ONLINE NORMAL PCASDGF_0132 /dev/oracle/fra/c17t6d7 MEMBER
CACHED ONLINE NORMAL PCASDGF_0133 /dev/oracle/fra/c17t7d0 MEMBER
CACHED ONLINE NORMAL PCASDGF_0134 /dev/oracle/fra/c17t7d1 MEMBER
CACHED ONLINE NORMAL PCASDGF_0135 /dev/oracle/fra/c17t7d2 MEMBER
CACHED ONLINE NORMAL PCASDGF_0136 /dev/oracle/fra/c17t7d3 MEMBER
CACHED ONLINE NORMAL PCASDGF_0137 /dev/oracle/fra/c17t7d4 MEMBER
CACHED ONLINE NORMAL PCASDGF_0138 /dev/oracle/fra/c17t7d5 MEMBER
CACHED ONLINE NORMAL PCASDGF_0139 /dev/oracle/fra/c17t7d6 MEMBER
CACHED ONLINE NORMAL PCASDGF_0140 /dev/oracle/fra/c17t7d7 MEMBER
CACHED ONLINE NORMAL PCASDGF_0141 /dev/oracle/fra/c17t8d5 MEMBER

SQL> alter diskgroup PCASDGF drop disk PCASDGF_0132;
SQL> alter diskgroup PCASDGF drop disk PCASDGF_0133;
SQL> alter diskgroup PCASDGF drop disk PCASDGF_0134;
SQL> alter diskgroup PCASDGF drop disk PCASDGF_0135;
SQL> alter diskgroup PCASDGF drop disk PCASDGF_0136;
SQL> alter diskgroup PCASDGF drop disk PCASDGF_0137;
SQL> alter diskgroup PCASDGF drop disk PCASDGF_0138;
SQL> alter diskgroup PCASDGF drop disk PCASDGF_0139;
SQL> alter diskgroup PCASDGF drop disk PCASDGF_0140;
SQL> alter diskgroup PCASDGF drop disk PCASDGF_0141;

You can happily drop disks in a disk group and ASM will seamlessly migrate the data to the existing disks in the disk group. The prompt returns immediatly, but the job (of migrating the data) is not yet done. In order to monitor progress use the following SQL.


SQL> select * from v$asm_operation
/

When the job is done the SQL will retun no rows. The status of the disks are also updated.


SQL> select MOUNT_STATUS, MODE_STATUS, STATE, NAME, PATH, header_status
from v$asm_disk
where name is null
/
MOUNT_S MODE_ST STATE NAME PATH HEADER_STATU
------- ------- -------- ------------------------------ ------------------------------ ------------
CLOSED ONLINE NORMAL /dev/oracle/fra/c17t6d7 FORMER
CLOSED ONLINE NORMAL /dev/oracle/fra/c17t7d0 FORMER
CLOSED ONLINE NORMAL /dev/oracle/fra/c17t7d1 FORMER
CLOSED ONLINE NORMAL /dev/oracle/fra/c17t7d2 FORMER
CLOSED ONLINE NORMAL /dev/oracle/fra/c17t7d3 FORMER
CLOSED ONLINE NORMAL /dev/oracle/fra/c17t7d4 FORMER
CLOSED ONLINE NORMAL /dev/oracle/fra/c17t7d5 FORMER
CLOSED ONLINE NORMAL /dev/oracle/fra/c17t7d6 FORMER
CLOSED ONLINE NORMAL /dev/oracle/fra/c17t7d7 FORMER
CLOSED ONLINE NORMAL /dev/oracle/fra/c17t8d0 FORMER

BUT

# fuser /dev/oracle/fra/c17t6d7 returns/dev/oracle/fra/c17t6d7: 2924o 18184o 18175o 4129o 618o

The process details are

oracle 2924 1 0 Nov 4 ? 17:53 asm_rbal_+ASM
oracle 18184 1 0 Nov 17 ? 1:16 ora_rbal_casbcva
oracle 18175 1 0 Nov 17 ? 1:34 ora_rvwr_casbcva
oracle 4129 4128 0 10:49:06 ? 0:00 oracle+ASM (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oracle 618 1 0 Nov 25 ? 37:06 oraclecasbcva (LOCAL=NO)

Is it safe to physically remove these disks? +ASM still knows about these disks and Oracle processes are still attached to these disks. I logged a SR with Oracle Support and got the following feedback:

Bug 7516653 - DROPPED ASM DISK STILL HELD BY ASM PROCESSES
Closed as duplicate for bug:
Bug 7225720 - ASM DOES NOT CLOSE OPEN DESCRIPTORS EVEN AFTER APPLYING THE Patch 4693355

Fixed in 11.2

Please perform the following workaround:
1. create dummy diskgroup using this disk:
SQL> create diskgroup test external redundancy disk ;
2. drop this diskgroup:
SQL> drop diskgroup test;
3. check if the disk still help by any process.
If the disk still held then we will have to restart the ASM instance.

So that's what I did


SQL> create diskgroup test 
external redundancy 
disk /dev/oracle/fra/c17t6d7','/dev/oracle/fra/c17t7d0',
'/dev/oracle/fra/c17t7d1','/dev/oracle/fra/c17t7d2',
'/dev/oracle/fra/c17t7d3','/dev/oracle/fra/c17t7d4',
'/dev/oracle/fra/c17t7d5','/dev/oracle/fra/c17t7d6',
'/dev/oracle/fra/c17t7d7','/dev/oracle/fra/c17t8d0'
/
SQL> drop diskgroup test
/

Fuser still showed os processes accesing the devices and I eventually had to bounce +ASM before the devices could be safely removed.

Wednesday, December 10, 2008

SQL workload history with DBA_HIST_SQLSTAT

Using the DBA_HIST_XXX views from the AWR (automatic workload repository), it has become easier for a DBA to track changes in workload metrics over time. Through a SR I logged with Oracle Support they have supplied me with the following sql statement to track such changes for a single SQL statement.

Just substitute XXX with your SQL_ID.


set trimspool on
set lines 220
set long 10000
set longchunk 10000
set pages 99
set longchunk 100000
set long 10000
set time on
undefine SQL_ID
undefine PLAN_HASH_VALUE
undefine SQL_TEXT

Select distinct dbid, sql_id, plan_hash_value, timestamp
from dba_hist_sql_plan where sql_id='XXX'
order by dbid, timestamp
/

undefine PLAN_HASH_VALUE
accept PLAN_HASH_VALUE prompt 'Please enter PLAN_HASH_VALUE to show Statistics for: '
col iowait_delta format 9999999.99 heading iowaitdelta(ms)
col iowait_total format 9999999.99 heading iowaittotal(ms)
col ELAPSED_TIME_TOTAL format 9999999.99 heading elapsdtimetotal(ms)
col ELAPSED_TIME_DELTA format 9999999.99 heading elapsdtimedelta(ms)
col PLAN_HASH_VALUE heading plan_hashvalue
col CONCURRENCY_WAIT_TOTAL format 9999999.99 heading concwaittotal(ms)
col CONCURRENCY_WAIT_delta format 9999999.99 heading concwaitdelta(ms)
col CLUSTER_WAIT_DELTA format 9999999.99 heading clustwaitdelta(ms)
col CLUSTER_WAIT_TOTAL format 9999999.99 heading clustwaittotal(ms)
col APWAIT_TOTAL format 9999 heading applwaittimetotal(micro)
col APWAIT_DELTA format 9999 heading applwaittimedelta(micro)
col PLSEXEC_TIME_TOTAL format 9999 heading plsqlexectimetotal(micro)
col PLSEXEC_TIME_DELTA format 9999 heading plsqlexectimedelta(micro)
col JAVAEXEC_TIME_DELTA format 9999 heading javaexectimedelta(micro)
col JAVAEXEC_TIME_TOTAL format 9999 heading javaexectimetotal(micro)
col optimizer_cost format 9999 heading optcostcol optimizer_mode format a10 heading optimmode
col kept_versions format 999 heading keptvers
col invalidations_total format 999 heading invalidtot
col invalidations_delta format 999 heading invaliddlt
col parse_calls_total format 99999 heading parsecallstotal
col parse_calls_delta format 99999 heading parsecallsdelta
col executions_total format 999999 heading exectotal
col executions_delta format 999999 heading execdelta
col fetches_total format 9999999 heading fetchestotal
col fetches_delta format 9999999 heading fetchesdelta
col end_of_fetch_count_total format 9999 heading endoffetchcalltotal
col end_of_fetch_count_delta format 9999 heading endoffetchcalldelta
col buffer_gets_total format 99999999 heading buffergetstotal
col buffer_gets_delta format 99999999 heading buffergetsdelta
col disk_reads_total format 999999 heading diskreadstotal
col disk_reads_delta format 9999999 heading diskreadsdelta
col rows_processed_total format 9999999 heading rowsprocessedtotal
col rows_processed_delta format 9999999 heading rowsprocesseddelta
col rows_ex format 999999 heading rowsexeccol snap_id format 99999 heading snapid
col ela_ex format 9999999.99 heading elapsedperexecution
col cwt_ex format 9999999.99 heading cwtperexecution
col instance_number format 99 heading inID

select sql_id, plan_hash_value,dba_hist_sqlstat.snap_id,
to_char(dba_hist_snapshot.BEGIN_INTERVAL_TIME,'dd-mm_hh24:mi') snap_beg,dba_hist_sqlstat.instance_number,invalidations_delta,
parse_calls_delta,executions_delta,fetches_delta,buffer_gets_delta,
disk_reads_delta,rows_processed_delta,elapsed_time_delta/1000 elapsed_time_delta,iowait_delta/1000 iowait_delta,clwait_delta/1000 cluster_wait_delta,ccwait_delta/1000 concurrency_wait_delta,optimizer_mode, optimizer_cost,
substr(optimizer_mode,1,3) opt,
case when executions_delta = 0 then NULL
when rows_processed_delta = 0 then NULL
else(rows_processed_delta/executions_delta)end rows_ex,
case when executions_delta = 0 then NULL
when clwait_delta = 0 then NULL
else(clwait_delta/executions_delta)/1000 end cwt_ex,
case when executions_delta = 0 then NULL
when elapsed_time_delta = 0 then NULL
else(elapsed_time_delta/executions_delta)/1000 end ela_ex
from dba_hist_sqlstat, dba_hist_snapshot
where sql_id='XXX'
and plan_hash_value='&PLAN_HASH_VALUE'
and dba_hist_sqlstat.snap_id=dba_hist_snapshot.snap_id
and dba_hist_sqlstat.instance_number=dba_hist_snapshot.instance_number
order by dba_hist_sqlstat.instance_number,dba_hist_sqlstat.snap_id
/

The view DBA_HIST_SQLSTAT displays historical information about SQL statistics. Each statistic is stored in two separate columns:
metric_TOTAL for the total value of the statistic since instance startup.
metic_DELTA for the change in a statistic’s value between BEGIN_INTERVAL_TIME to the END_INTERVAL_TIME that is stored in the DBA_HIST_SNAPSHOT view.

You can also query DBA_HIST_SQL_PLAN to compare the execution plans, if PLAN_HASH_VALUE has changed .

Friday, December 5, 2008

Delete archive logs from the standby server

I implemented the following script to delete archive logs from a standby server.
You cannot just delete the archive log files because that will leave entries in the V$ARCHIVED_LOG table.


#!/bin/ksh
#**************************************************************
# Here is where we loop through each SID in /etc/oratab . . .
#**************************************************************
cat /etc/oratab grep '^c.*:.*:[Yy]$' {
while IFS=":" read ORACLE_SID ORACLE_HOME junk
do
PATH=${ORACLE_HOME}/bin:$PATH
export ORACLE_SID ORACLE_HOME PATH

# Now run the SQL script to generate a rm script
sqlplus -s /nolog <<EOF
connect / as sysdba
set heading off feedback off timing off
spool /usr/users/oracle/cronscripts/logs/rm_arch_${ORACLE_SID}.ksh
select 'rm '||NAME from v\$archived_log 
where REGISTRAR='RFS' 
and APPLIED='YES' 
and DELETED='NO' 
and COMPLETION_TIME < (SYSDATE-1);
spool off
exit
EOF


# Now run the generated rm script
chmod 740 /usr/users/oracle/cronscripts/logs/rm_arch_${ORACLE_SID}.ksh
/usr/users/oracle/cronscripts/logs/rm_arch_${ORACLE_SID}.ksh

# Now use RMAN to update V$ARCHIVED_LOG.DELETED
rman <<EOF
connect target /
crosscheck archivelog all;
delete noprompt expired archivelog all;
EOF

done
}

Notes:
columns V$ARCHIVED_LOG.REGISTRAR and APPLIED
If REGISTRAR='RFS' and APPLIEDis NO, then the log has arrived at the standby but has not yet been applied.
If REGISTRAR='RFS' and APPLIED is YES, the log has arrived and been applied at the standby database.

column V$ARCHIVED_LOG.DELETED
Indicates whether an RMAN DELETE command has physically deleted the archived log file from disk (YES) or not (NO)

RMAN EXPIRED
Removes only files whose status in the repository is EXPIRED.
RMAN marks backups and copies as expired when you run a CROSSCHECK command and the files are absent or inaccessible.
To determine which files are expired, run a LIST EXPIRED command.

RMAN NOPROMPT
Beginning in Oracle9i, RMAN's default behavior is to prompt for confirmation when you run DELETE EXPIRED.
In prior releases, RMAN did not prompt.

Fast Index Creation

I recently had to create a new index on a fairly large date ranged partitioned table. I tried a few permutations on test, but found the following steps to be the fastest;


01:01:03 SQL> alter session set workarea_size_policy=manual

Session altered.

01:13:44 SQL> alter session set hash_area_size=1073741824

Session altered.

01:13:44 SQL> alter session set sort_area_size=1073741824

Session altered.

01:13:44 SQL> select force_logging from v$database

FOR
---
YES

01:13:44 SQL> alter database no force logging

Database altered.

01:13:46 SQL>create index imei_prod.usage_profile_idx1 on imei_prod.usage_profile(msisdn)
01:13:46 2 nologging
01:13:46 3 compress
01:13:46 4 parallel 16
01:13:46 5 local
01:13:46 6 (
01:13:46 7 partition USAGE_199401 tablespace indx nologging compress,
01:13:46 8 partition USAGE_200106 tablespace indx nologging compress,
01:13:46 9 partition USAGE_200202 tablespace indx nologging compress,
01:13:46 10 partition USAGE_200207 tablespace indx nologging compress,
...
01:13:46 56 partition USAGE_200809 tablespace indx nologging compress,
01:13:46 57 partition USAGE_200810 tablespace indx nologging compress,
01:13:46 58 partition USAGE_200811 tablespace indx nologging compress
01:13:46 59 )
01:13:46 60 /

Index created

Elapsed: 01:58:57.99


03:12:44 SQL> SELECT st.sid, se.username, TO_CHAR(se.logon_time,'dd-mon-yy hh24:mi')
03:12:44 2 logtime, se.program, (value/1048576) VALUE
03:12:44 3 FROM gv$sesstat st, gv$statname sn, gv$session se
03:12:44 4 WHERE username = 'MEYERM'
03:12:44 5 and sn.name = 'redo size'
03:12:44 6 AND sn.statistic# = st.statistic#
03:12:44 7 AND st.sid = se.sid
03:12:44 8 /


Redo
Generated
SID USERNAME Logon Time PROGRAM in MB
--------------------------------------------------------------
146 MEYERM 26-nov-08 01:13 sqlplus@pxxxx1 (TNS V1-V3) 116.583

03:12:46 SQL> alter database force logging

Database altered.

03:12:46 SQL> alter index imei_prod.usage_profile_idx1 logging;

Index altered.

Wednesday, November 19, 2008

Drop an unmounted ASM diskgroup

We are getting our hands dirty with ASM and will be moving all our databases from RAW devices to ASM in the next couple of months. My worst fear came to be when a Unix Sys Admin used a disk already allocated to ASM for a local filesystem (luckily this did not happen on a production server!).
When ASM realised this is immediatly issued a ALTER DISKGROUP PCASDG1 DISMOUNT FORCE command.
By the time the Unix Sys Admin returned the disk it was unrecognozable by ASM as the disk headers had been overwritten. The diskgroup was unrecoverable and unusable.


SQL> select GROUP_NUMBER, NAME, state from v$asm_diskgroup;
GROUP_NUMBER NAME STATE
------------ --------- ----------
0 PCASDG1 DISMOUNTED
2 PCASDGF MOUNTED
3 PCASDGR MOUNTED
3 rows selected.

SQL>alter diskgroup PCASDG1 mount;
alter diskgroup PCASDG1 mount
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15040: diskgroup is incomplete
ORA-15042: ASM disk "19" is missing

SQL>drop diskgroup PCASDG1 including contents;
drop diskgroup PCASDG1 including contents
*
ERROR at line 1:
ORA-15039: diskgroup not dropped
ORA-15001: diskgroup "PCASDG1" does not exist or is not mounted

SQL>alter diskgroup PCASDG1 check;
alter diskgroup PCASDG1 check
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15001: diskgroup "PCASDG1" does not exist or is not mounted

I had to recreate the whole diskgroup from scratch. I followed Metalink Note 387103.1 to recreate the diskgroup.
Basically you clean up the asm disk header of all the disks in the diskgroup. This will also remove the whole diskgroup from asm.


SQL> select 'dd if=/dev/zero of='PATH' bs=8192 count=100' from v$asm_disk where GROUP_NUMBER=0;

$dd if=/dev/zero of=/dev/oracle/dg/c11t15d2 bs=8192 count=100
100+0 records in
100+0 records out

$dd if=/dev/zero of=/dev/oracle/dg/c11t7d0 bs=8192 count=100
100+0 records in
100+0 records out
etc ... 

SQL> select GROUP_NUMBER, NAME, state from v$asm_diskgroup;
GROUP_NUMBER NAME STATE
------------ --------- ----------
2 PCASDGF MOUNTED
3 PCASDGR MOUNTED
2 rows selected.

The diskgroup is now removed. I then used DBCA to recreate the ASM diskgroup, all these disks showed up as Candidate disks.

Monday, November 10, 2008

Oracle Data Guard Presentation

I had the privilege to present at the SAOUG (South African Oracle User Group) in Sun City last month. The conference was well organized, the location was great and we had close to 1000 attendees. I presented on Oracle Data Guard.

Thanks to Karen Morton (one of my favorite bloggers) for making me aware of SlideShare. To access my presentation, just click on the slideshare image on the right side and it will take you to my presentation on slideshare. Happy viewing!

Thursday, October 30, 2008

Easy Streams Cookbook Example

We have Call Centres all over the country. The Call Centre’s IVR (interactive voice response, is a phone technology that allows a computer to detect voice and touch tones using a normal phone call) is used for routing calls to the correct call centre. The subscriber data is stored in 1 location and I was asked to replicate that data to 2 databases in other parts of the country. The main reasons are to have the data closer to the IVR system and also to relieve the main production system from all these queries.

Decisions: Use one-directional replication. Capture the changes at the source. Do the instantiation (first copy of data) with DataPump.

Preperation Steps

Enable Supplemental Logging (at source)
Use columns in PK constraints for Supplemental log group

ALTER TABLE "SCHEMA_ABC"."TABLE_ABC" ADD SUPPLEMENTAL LOG GROUP ivr_rep (XXMSISDN) ALWAYS;

This step might wait on “enq: TX – contention” on a busy production system, so it better to do it with a Change Control after hours

To confirm Supplemental Log Groups:


COLUMN LOG_GROUP_NAME HEADING 'Log Group' FORMAT A20
COLUMN TABLE_NAME HEADING 'Table' FORMAT A15
COLUMN ALWAYS HEADING 'Conditional or|Unconditional' FORMAT A14
COLUMN LOG_GROUP_TYPE HEADING 'Type of Log Group' FORMAT A20
SELECT LOG_GROUP_NAME, TABLE_NAME, DECODE(ALWAYS,'ALWAYS', 'Unconditional','CONDITIONAL', 'Conditional') ALWAYS, LOG_GROUP_TYPE
FROM DBA_LOG_GROUPS;

Create default tablespace for the Streams Administrator Account to store the queue table. Do these steps on the source and destinations.

CREATE TABLESPACE streams_tbs DATAFILE '/data/oradata/lfivrdb/streams_tbs.dbf' SIZE 7168M;

CREATE USER strmadmin IDENTIFIED BY STRMXXX 
  PROFILE VODA_COMPLIANT
  DEFAULT TABLESPACE streams_tbs QUOTA UNLIMITED ON streams_tbs;
GRANT CONNECT, RESOURCE, DBA to STRMADMIN;
GRANT SELECT ANY DICTIONARY TO STRMADMIN;
GRANT SELECT_CATALOG_ROLE TO STRMADMIN;
EXEC DBMS_STREAMS_AUTH.GRANT_ADMIN_PRIVILEGE('STRMADMIN');

Set specific initialization parameters at the databases participating in the Streams setup

ALTER SYSTEM SET STREAMS_POOL_SIZE=200M SCOPE=BOTH;
ALTER SYSTEM SET AQ_TM_PROCESSES = 1 scope=both;
ALTER SYSTEM SET JOB_QUEUE_PROCESSES=10 scope=both;
ALTER SYSTEM SET “_job_queue_interval”=1 scope=spfile;

(This is not a dynamic parameter and the db will need to be bounced)

Create 2 database links on the source to the 2 destinations.

CONNECT STRMADMIN/STRMXXX
CREATE DATABASE LINK LFIVRDB.VODACOM.CO.ZA CONNECT TO STRMADMIN IDENTIFIED BY STRMXXX USING ‘LFIVRDB.VODACOM.CO.ZA’;
CREATE DATABASE LINK BEIVRDB.VODACOM.CO.ZA CONNECT TO STRMADMIN IDENTIFIED BY STRMXXX USING ‘BEIVRDB.VODACOM.CO.ZA’;

STEP 1
Create Streams Queues
One Capture on the source db and one Apply on each destination:

connect STRMADMIN/STRMXXX
BEGIN
DBMS_STREAMS_ADM.SET_UP_QUEUE(
queue_table => 'STREAMS_QUEUE_TABLE',
storage_clause => 'TABLESPACE STREAMS_TBS',
queue_name => ' STREAMS_CAPTURE_Q', 
queue_user => 'STRMADMIN');
END;
/

And on both dest sites:

connect STRMADMIN/STRMXXX
BEGIN
DBMS_STREAMS_ADM.SET_UP_QUEUE(
queue_table => 'STREAMS_QUEUE_TABLE',
storage_clause => 'TABLESPACE STREAMS_TBS',
queue_name => ' STREAMS_APPLY_Q', 
queue_user => 'STRMADMIN');
END;
/

STEP2
Create Capture Process on Source.
This step might wait on “enq: TX – contention” on a busy production system, so it better to do it with a Change Control after hours

connect STRMADMIN/STRMXXX
BEGIN
  DBMS_STREAMS_ADM.ADD_TABLE_RULES(
    table_name         => '"SCHEMA_ABC"."TABLE_ABC"',
    streams_type       => 'CAPTURE',
    streams_name       => '"STREAMS_CAPTURE"',
    queue_name         => '"STRMADMIN"."STREAMS_CAPTURE_Q"',
    include_dml        => true,
    include_ddl        => true,
    source_database    => 'DESTPRD.VODACOM.CO.ZA');
    END;
/

STEP3
Create 2 propagation process on the Source (one per destination)

connect STRMADMIN/STRMXXX
BEGIN
  DBMS_STREAMS_ADM.ADD_TABLE_PROPAGATION_RULES(
    table_name          => '"SCHEMA_ABC"."TABLE_ABC"',
    streams_name        => '"STREAMS_PROP_LF"',
    source_queue_name   => '"STRMADMIN"."STREAMS_CAPTURE_Q"',
    destination_queue_name  => '"STRMADMIN"."STREAMS_APPLY_Q"@LFIVRDB.VODACOM.CO.ZA',
    include_dml         => true,
    include_ddl         => true,
    source_database    => 'DESTPRD.VODACOM.CO.ZA',
    inclusion_rule      => true);
END;
/

And to the 2nd dest site:

connect STRMADMIN/STRMXXX
BEGIN
  DBMS_STREAMS_ADM.ADD_TABLE_PROPAGATION_RULES(
    table_name          => '"SCHEMA_ABC"."TABLE_ABC"',
    streams_name        => '"STREAMS_PROP_BE"',
    source_queue_name   => '"STRMADMIN"."STREAMS_CAPTURE_Q"',
    destination_queue_name  => '"STRMADMIN"."STREAMS_APPLY_Q"@BEIVRDB.VODACOM.CO.ZA',
    include_dml         => true,
    include_ddl         => true,
    source_database    => 'DESTPRD.VODACOM.CO.ZA',
    inclusion_rule      => true);
END;
/

STEP 4
Create 2 Apply processes. One on each destination.

connect STRMADMIN/STRMXXX
BEGIN
DBMS_STREAMS_ADM.ADD_TABLE_RULES(
table_name          => '"SCHEMA_ABC"."TABLE_ABC"',
streams_type => 'APPLY', 
streams_name => 'STREAMS_APPLY', 
queue_name => 'STRMADMIN.STREAMS_APPLY_Q',
include_dml => true, 
include_ddl => true, 
source_database    => 'DESTPRD.VODACOM.CO.ZA');
END;
/

And 2nd site

connect STRMADMIN/STRMXXX
BEGIN
DBMS_STREAMS_ADM.ADD_TABLE_RULES(
table_name          => '"SCHEMA_ABC"."TABLE_ABC"',
streams_type => 'APPLY', 
streams_name => 'STREAMS_APPLY', 
queue_name => 'STRMADMIN.STREAMS_APPLY_Q',
include_dml => true, 
include_ddl => true, 
source_database    => 'DESTPRD.VODACOM.CO.ZA');
END;
/

Instantiation. This is the initial copy of the table.

Set & Get SCN:

EXEC DBMS_CAPTURE_ADM.PREPARE_TABLE_INSTANTIATION(table_name =>  'SCHEMA_ABC.TABLE_ABC');

This step might wait on “enq: TX – contention” on a busy production system, so it better to do it with a Change Control after hours

To confirm the SCN number that you will need in a future step:

col scn for 9999999999999999
select * from dba_capture_prepared_tables

Export table at source

expdp strmadmin/STRMXXX TABLES= SCHEMA_ABC.TABLE_ABC DUMPFILE=a%u.dmp parallel=8

compress and ftp dump files to the 2 destination servers

Import table at each destination
First create the schema owner of the table

create user SCHEMA_ABC identified by STRMXXX default tablespace streams_tbs;
grant connect, resource to SCHEMA_ABC
/

impdp strmadmin/STRMXXX TABLES=SCHEMA_ABC.TABLE_ABC DUMPFILE=a%u.dmp remap_tablespace=TBS_ABC:streams_tbs

Manually instantiating the objects at each destination

connect STRMADMIN/STRMXXX
BEGIN
DBMS_APPLY_ADM.SET_TABLE_INSTANTIATION_SCN(
source_object_name => 'SCHEMA_ABC.TABLE_ABC',
source_database_name => 'DESTPRD.VODACOM.CO.ZA',
instantiation_scn => &iscn);
END;
/
Enter value for iscn:

STEP5
Finally start the Process:
On source

CONNECT strmadmin/STRMXXX
exec dbms_capture_adm.start_capture(capture_name => 'STREAMS_CAPTURE');

At 1st destination:

CONNECT strmadmin/STRMXXX
exec dbms_apply_adm.start_apply(apply_name => 'STREAMS_APPLY');

At 2nd destination:

CONNECT strmadmin/STRMXXX
exec dbms_apply_adm.start_apply(apply_name => 'STREAMS_APPLY');

Post Installation steps
On Source force Stream checkpoint (see Metalink Note:342342.1)

EXEC DBMS_CAPTURE_ADM.SET_PARAMETER('STREAMS_CAPTURE','_CHECKPOINT_FORCE','Y');

On each destination

EXEC DBMS_APPLY_ADM.SET_PARAMETER(apply_name => 'STREAMS_APPLY', parameter => 'WRITE_ALERT_LOG', value => 'Y' );
EXEC DBMS_APPLY_ADM.SET_PARAMETER(apply_name => 'STREAMS_APPLY',parameter => 'DISABLE_ON_ERROR',value => 'N' );

Monday, September 15, 2008

SQL*Loader direct

We had to setup SQL Server Replication to an Oracle database. The 2 tables in questions are huge. The small table has 63mill rows and the big one has 1.5bill rows. The big table has 50 date ranged partitions.

I used SQL Loader direct path without any indexes in place from csv files to do the initial load.

The first thing to do was to prepare the database for Direct Loading by executing the script $ORACLE_HOME/rdbms/admin/catldr.sql.

I also had to set my session’s DATE format to be the same as the format in the CSV file.

$ export NLS_DATE_FORMAT='YYYY-MM-DD HH24:MI:SS'

To get the fastest possible load performance on the 50 partitions I followed these rules:
There were no indexes on the table
The table was set to NOLOGGING
The database was put in NOARCHIVELOG mode

And the following SQL Loader options were used:
DIRECT=TRUE to invoke Direct Loading. This will effectively bypass most of the RDBMS processing
UNRECOVERABLE to turn off database logging
PARELLEL=TRUE and APPEND in order to run multiple load jobs concurrently for the different partitions.

My Korn Shell Script looked like this:


#!/bin/ksh

export ORAENV_ASK=NO;
export ORACLE_SID=xxprd;
. oraenv;

export NLS_DATE_FORMAT='YYYY-MM-DD HH24:MI:SS".000"'

sqlldr xx/xx control=c1.ctl DIRECT=TRUE PARALLEL=TRUE log=c1.log &
sqlldr xx/xx control=c2.ctl DIRECT=TRUE PARALLEL=TRUE log=c2.log &
sqlldr xx/xx control=c3.ctl DIRECT=TRUE PARALLEL=TRUE log=c3.log &

… etc up to 10 jobs

And I created 10 control files they all looked identical except for the infile:


UNRECOVERABLE LOAD DATA
 infile '/dump/exports/XXprd/x200512.csv'
 append into table usage_profile
 fields terminated by "," TRAILING NULLCOLS
 (MONTH_ID, MSISDN, IMSI, IMEI, START_DT, END_DT, EVENT_CNT, STREAM, SUCCESS_CALLS, UNSUCCESS, DURATION, VOLUME_UPLINK, VOLUME_DOWNLINK, VOLUME, SUBSCRIBER_TYPE, SP_CODE)

The job completed in 12 minutes and 5 jobs later all 1.5 billion records were loaded!

Wednesday, September 3, 2008

Moving Online Redo Log Files

We found a lot of sessions waiting for Log File Sync and in turn found that LGWR was waiting for Log File Parallel Write.

We have 2 dedicated disks for Online redo logs. They are multiplexed, with members of the same group on different disks. When a Log Switch occurs 1 member of the Redo Logs being Archived gets READS from the ARCH process while both members of the CURRENT group are receiving WRITE requests from LGWR. We decided to add 2 more disks and split the even numbered groups from the uneven numbered groups. So that we can separate the READS and WRITES.

Current layout:


SQL>select GROUP#, STATUS, TYPE, MEMBER from v$logfile;

G# STATUS TYPE MEMBER
-------------------------
1 ONLINE /dev/vx/rdsk/redodg/casprd_REDO1
1 ONLINE /dev/vx/rdsk/redodg/casprd_REDO1b
2 ONLINE /dev/vx/rdsk/redodg/casprd_REDO2
2 ONLINE /dev/vx/rdsk/redodg/casprd_REDO2b
3 ONLINE /dev/vx/rdsk/redodg/casprd_REDO3
3 ONLINE /dev/vx/rdsk/redodg/casprd_REDO3b
4 ONLINE /dev/vx/rdsk/redodg/casprd_REDO4
4 ONLINE /dev/vx/rdsk/redodg/casprd_REDO4b
5 ONLINE /dev/vx/rdsk/redodg/casprd_REDO5
5 ONLINE /dev/vx/rdsk/redodg/casprd_REDO5b
6 ONLINE /dev/vx/rdsk/redodg/casprd_REDO6
6 ONLINE /dev/vx/rdsk/redodg/casprd_REDO6b

SQL> select GROUP#,SEQUENCE#,BYTES,MEMBERS,ARC,STATUS,FIRST_TIME from v$log;

G# S# BYTES MEMBERS ARC STATUS FIRST_TIME
------------------------------------------
1 45181 2147483648 2 YES INACTIVE 03-SEP-2008 08:00:18
2 45182 2147483648 2 YES INACTIVE 03-SEP-2008 08:45:28
3 45183 2147483648 2 NO CURRENT 03-SEP-2008 09:00:47
4 45178 2147483648 2 YES INACTIVE 03-SEP-2008 06:25:02
5 45179 2147483648 2 YES INACTIVE 03-SEP-2008 07:02:21
6 45180 2147483648 2 YES INACTIVE 03-SEP-2008 07:48:50

We use RAW devices, so after the Unix Admins created the new RAW volumes on the new disks I simply added the members:

Creating Redo Log Members for the even groups on the 2 new disks:


ALTER DATABASE ADD LOGFILE MEMBER '/dev/vx/rdsk/redodg/casprd_REDO2c' TO GROUP 2;
ALTER DATABASE ADD LOGFILE MEMBER '/dev/vx/rdsk/redodg/casprd_REDO2d' TO GROUP 2;
ALTER DATABASE ADD LOGFILE MEMBER '/dev/vx/rdsk/redodg/casprd_REDO4c' TO GROUP 4;
ALTER DATABASE ADD LOGFILE MEMBER '/dev/vx/rdsk/redodg/casprd_REDO4d' TO GROUP 4;
ALTER DATABASE ADD LOGFILE MEMBER '/dev/vx/rdsk/redodg/casprd_REDO6c' TO GROUP 6;
ALTER DATABASE ADD LOGFILE MEMBER '/dev/vx/rdsk/redodg/casprd_REDO6d' TO GROUP 6;

The status of the new log member is shown as INVALID. This is normal and it will change to active (blank) when it is first used.


SQL> select GROUP#,STATUS,TYPE,MEMBER from v$logfile order by group#, member;

G STATUS TYPE MEMBER
-----------------------------------------
1 ONLINE /dev/vx/rdsk/redodg/casprd_REDO1
1 ONLINE /dev/vx/rdsk/redodg/casprd_REDO1b
2 ONLINE /dev/vx/rdsk/redodg/casprd_REDO2
2 ONLINE /dev/vx/rdsk/redodg/casprd_REDO2b
2 INVALID ONLINE /dev/vx/rdsk/redodg/casprd_REDO2c
2 INVALID ONLINE /dev/vx/rdsk/redodg/casprd_REDO2d
3 ONLINE /dev/vx/rdsk/redodg/casprd_REDO3
3 ONLINE /dev/vx/rdsk/redodg/casprd_REDO3b
4 ONLINE /dev/vx/rdsk/redodg/casprd_REDO4
4 ONLINE /dev/vx/rdsk/redodg/casprd_REDO4b
4 INVALID ONLINE /dev/vx/rdsk/redodg/casprd_REDO4c
4 INVALID ONLINE /dev/vx/rdsk/redodg/casprd_REDO4d
5 ONLINE /dev/vx/rdsk/redodg/casprd_REDO5
5 ONLINE /dev/vx/rdsk/redodg/casprd_REDO5b
6 ONLINE /dev/vx/rdsk/redodg/casprd_REDO6
6 ONLINE /dev/vx/rdsk/redodg/casprd_REDO6b
6 INVALID ONLINE /dev/vx/rdsk/redodg/casprd_REDO6c
6 INVALID ONLINE /dev/vx/rdsk/redodg/casprd_REDO6d

SQL> select GROUP#,SEQUENCE#,BYTES,MEMBERS,ARC,STATUS,FIRST_TIME from v$log; 

G# S# BYTES MEMBERS ARC STATUS FIRST_TIME 
------------------------------------------------
1 45181 2147483648 2 YES ACTIVE 03-SEP-2008 09:24:18
2 45182 2147483648 4 YES ACTIVE 03-SEP-2008 09:27:28
3 45183 2147483648 2 NO CURRENT 03-SEP-2008 09:30:47
4 45178 2147483648 4 YES INACTIVE 03-SEP-2008 09:17:02
5 45179 2147483648 2 YES INACTIVE 03-SEP-2008 09:19:21
6 45180 2147483648 4 YES INACTIVE 03-SEP-2008 09:22:50

Before dropping Redo Log Members you should be aware of the following rules:
You can drop a redo log member only if it is not part of an active or current group.
Make sure the group to which a redo log member belongs is archived before dropping the member.
If the member you want to drop is the last valid member of the group, you cannot drop the member until the other members become valid.
So after a few log switches (ALTER SYSTEM SWITCH LOGFILE) I was ready to drop the old members.
Note that V$LOG.STATUS show ACTIVE for group 1 and 2. This means the Log is active but is not the current log. It is needed for crash recovery. It may be in use for block recovery. It may or may not be archived. This was the result of all the log switches I did. Once their STATUS changed to INACTIVE I continued dropping the old member.


ALTER DATABASE DROP LOGFILE MEMBER '/dev/vx/rdsk/redodg/casprd_REDO2';
ALTER DATABASE DROP LOGFILE MEMBER '/dev/vx/rdsk/redodg/casprd_REDO2b';
ALTER DATABASE DROP LOGFILE MEMBER '/dev/vx/rdsk/redodg/casprd_REDO4';
ALTER DATABASE DROP LOGFILE MEMBER '/dev/vx/rdsk/redodg/casprd_REDO4b';
ALTER DATABASE DROP LOGFILE MEMBER '/dev/vx/rdsk/redodg/casprd_REDO6';
ALTER DATABASE DROP LOGFILE MEMBER '/dev/vx/rdsk/redodg/casprd_REDO6b';

Tuesday, July 8, 2008

Using various tools to solve a problem

I was recently called in to take over a performance problem. I quickly determined that only a portion of the database was affected, but it was the most visible part, affecting the business ability to offer a certain feature to millions of our customers. Some queries that should take less than 5 seconds suddenly took minutes to complete.

Of course the App Vendor claimed that nothing changed around that particular query in years, so it had to be a database problem.

Looking at OEM’s Average Active Sessions graph on the Performance tab, I could clearly see a huge red stripe (red means the wait is Application related). Drilling down I found waits for an event called “enq: UL – contention” to be the problem. A metalink search retuned no hits and a google search retuned hundreds of forum websites with DBA’s asking “What is enq: UL – contention?”, most of them did not have any replies. The only reference to this event in the manual was found in Oracle® Database Reference 10g Release 2 (10.2) (B14237-01) Appendix E: Oracle Enqueue Names . It simply stated “UL, User-defined Locks”. To me that meant that the app could be using DBMS_LOCK, but the Vendor said they don’t and app support told me that these queries do not write to the database.

I queried V$SESSION for sessions waiting for EVENT=”enq: UL – contention” to find the SID of the BLOCKING_SESSION, but the BLOCKING_SESSION kept on changing. One blocking session would have confirmed an App issue and the short term solution would have been to kill that process.

Then I logged a priority 1 SR and was asked to provide system state dumps. Eventually BUG 4367986 was identified as a possibility. The patch was applied, but it made no difference!

OK, I clearly had to try something else, so I decided to trace 1 session and go through the TKPROF output. I found that the app related SQL statements all executed in less than 5 second, but one statement stood out at 60 seconds elapsed time. That SQL statement was;

begin ctxsys.drvdml.com_sync_index(:idxname, :idxmem, :partname); end;

Schema ctxsys means Oracle Text is in use and the Vendor then confirmed that Oracle Text indexes where build on an app related audit table almost 2 months ago. The TKPROF also showed INSERTs into the app audit table, so we were finally on the right track.

This information was uploaded in the SR and I had to check the SYNC intervals via a select from dr$index and confirmed that the 2 audit indexes had SYNC ON COMMIT. We changed the COMMIT interval to MANUAL , just to proof if this problem is related to the SYNC ON COMMIT. Once we changed the commit interval to MANUAL, the wait event disappeared.

exec CTX_DDL.REPLACE_INDEX_METADATA('XX','REPLACE METADATA SYNC (MANUAL)');

The SR engineer then supplied the following script to change the SYNC to every 5 minutes.

exec CTX_DDL.REPLACE_INDEX_METADATA('XX','REPLACE METADATA SYNC (every "SYSDATE+5/1440")');

So the “moral of the story” is that as a DBA you have more than one tool to solve a problem and you should use as many of them as possible until the root cause of the problem is clear.