Tuesday, December 23, 2008

Problem setting up Archive Log file management on ASM

I created a ASM diskgroup for FRA and set the relevant parameters. The aim to to write Archive logs, Flashback logs and RMAN backups to this FRA.

SQL>select FREE_MB from v$asm_diskgroup where NAME='PRNADGF';

FREE_MB
----------
2,761,992

SQL>show parameter db_recovery_file

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
db_recovery_file_dest string
db_recovery_file_dest_size big integer 0

SQL>alter system set db_recovery_file_dest_size=2700000M;

System altered.

SQL>alter system set db_recovery_file_dest='+PRNADGF';

System altered.

My problems started when I changed to Archive Log Mode


SQL>archive log list
Database log mode No Archive Mode
Automatic archival Disabled
Archive destination USE_DB_RECOVERY_FILE_DEST
Oldest online log sequence 10
Current log sequence 12

SQL>shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.

SQL>startup mount
ORACLE instance started.

Total System Global Area 3156803584 bytes
Fixed Size 2110872 bytes
Variable Size 402655848 bytes
Database Buffers 2734686208 bytes
Redo Buffers 17350656 bytes
Database mounted.
SQL>alter database archivelog;

Database altered.

SQL>alter database open;

Database altered.

SQL>archive log list
Database log mode Archive Mode
Automatic archival Enabled
Archive destination USE_DB_RECOVERY_FILE_DEST
Oldest online log sequence 10
Next log sequence to archive 12
Current log sequence 12


So the archive log files should be written to my FRA on ASM, but when I tested it I got a unexpected error:


SQL>alter system archive log current;

alter system archive log current
*
ERROR at line 1:
ORA-16038: log 3 sequence# 12 cannot be archived
ORA-19502: write error on file "", block number (block size=)
ORA-00312: online log 3 thread 1: '+PRNADG1/rnatst/onlinelog/group_3.278.672669129'


This error message is misleading because there is nothing wrong with the FRA on ASM. I found the real error message in the alert.log file:


ARCH: Encountered disk I/O error 19502
ARCH: Closing local archive destination LOG_ARCHIVE_DEST_1: '/opt/apps/oracle/product/database/11.1.0.6.0/dbs/arch1_12_672668870.dbf' (error 19502)
(rnatst)
Errors in file /opt/apps/oracle/diag/rdbms/rnatst/rnatst/trace/rnatst_ora_21990.trc:


So the archiver also wants to archive to my $ORACLE_HOME/dbs directory, it cannot do that because there is not enough space on that filesystem, but the real question is why does it want to write to that destination?
When I do a show parameter I don't get the answer:


SQL> show parameter log_archive_dest

log_archive_dest string
log_archive_dest_1 string
log_archive_dest_10 string
log_archive_dest_2 string
log_archive_dest_3 string
log_archive_dest_4 string
log_archive_dest_5 string
log_archive_dest_6 string
log_archive_dest_7 string
log_archive_dest_8 string
log_archive_dest_9 string
log_archive_dest_state_1 string enable
log_archive_dest_state_10 string enable
log_archive_dest_state_2 string enable
log_archive_dest_state_3 string enable
log_archive_dest_state_4 string enable
log_archive_dest_state_5 string enable
log_archive_dest_state_6 string enable
log_archive_dest_state_7 string enable
log_archive_dest_state_8 string enable
log_archive_dest_state_9 string enable
log_archive_duplex_dest string
log_archive_min_succeed_dest integer 1


Again I found the answer in the alert.log


Starting ORACLE instance (normal)
Using LOG_ARCHIVE_DEST_1 parameter default value as /opt/apps/oracle/product/database/11.1.0.6.0/dbs/arch
Using LOG_ARCHIVE_DEST_10 parameter default value as USE_DB_RECOVERY_FILE_DEST


It seems to be a DEFAULT destination that does not show up when I use ARCHIVE LOG LIST or SHOW PARAMETER LOG_ARCHIVE_DEST.
It does show up in V$ARCHIVE_DEST though:


SQL> select dest_id, destination, status from V$ARCHIVE_DEST;


DEST_ID DESTINATION STATUS
---------- ------------------------------------------------------- ---------
1 /opt/apps/oracle/product/database/11.1.0.6.0/dbs/arch VALID
2 INACTIVE
3 INACTIVE
4 INACTIVE
5 INACTIVE
6 INACTIVE
7 INACTIVE
8 INACTIVE
9 INACTIVE
10 USE_DB_RECOVERY_FILE_DEST VALID


To solve the problem:


SQL> alter system set log_archive_dest_state_1=DEFER;

And to test

SQL>alter system archive log current;

System altered.

SQL>select dest_id, name from v$archived_log;

DEST_ID NAME
---------- ----------------------------------------------------------------------
10 +PRNADGF/rnatst/archivelog/2008_12_22/thread_1_seq_12.256.674133347

SQL> select dest_id, destination, status from V$ARCHIVE_DEST;


DEST_ID DESTINATION STATUS
---------- ------------------------------------------------------- ---------
1 /opt/apps/oracle/product/database/11.1.0.6.0/dbs/arch DISABLED
2 INACTIVE
3 INACTIVE
4 INACTIVE
5 INACTIVE
6 INACTIVE
7 INACTIVE
8 INACTIVE
9 INACTIVE
10 USE_DB_RECOVERY_FILE_DEST VALID


Lesson learned.
Keep on looking until you can explain the error.

Friday, December 12, 2008

Ora-17505 using RMAN with ASM

I had to create a Data Guard instance on test, so that we can evaluate Transparent Application Failover (TAF) for the app. I wanted to test the same procedure to build Data Guard that I will use for Production, and that meant that I had to do a RMAN backup to ASM. Our Storage team loaned me LUNs of different sizes for my temporary ASM on test.

But my backup command (Backup filesperset 10 database include current controlfile for standby) failed with:

RMAN-03009: failure of backup command on ORA_DISK_13 channel at 10/30/2008 20:21:27
ORA-19510: failed to set size of 5779210 blocks for file "+PCASDGF" (blocksize=8192)
ORA-17505: ksfdrsz:1 Failed to resize file to size 5779210 blocks

This was a surprise because the test db is 7TB and FRA is 9TB.


SQL> SELECT sum(space), sum(bytes) FROM v$asm_file;
and
SQL> SELECT * FROM V$FLASH_RECOVERY_AREA_USAGE;


confirmed that I had more than enough space available. In fact the backup failed after using only 47% fo the available space.

It turns out that the smallest disk in the diskgroup was the bottleneck.


SQL>select group_number, disk_number, total_mb, free_mb from v$asm_disk order by 4;

GROUP_NUMBER DISK_NUMBER TOTAL_MB FREE_MB
----------------------------------------
1 13 86315 70
1 16 17263 17080
1 14 17263 17080
1 129 34522 34168
1 130 34522 34168
1 131 34522 34168
1 19 34522 69052
...

As you can see disk 13 only had 70MB of space available. I removed all the disks of varying sizes and only kept the disks of 69052 MB Size. The total size of the FRA came down to 8493396 MB, but the RMAN backup completed successfully.

Lesson Learned:
ASM spreads file extents evenly accross all the disks disks on a diskgroup. An ORA-17505 error can still be encountered due to imbalanced free space between disks. The reason for this is that one disk lacking sufficient free space makes it impossible to do any allocation in a disk group because every file must be evenly allocated across all disks.


Remove ASM disk

One of the big selling points of ASM is the ability to reconfigure the storage online. I had to remove 10 disks from a +ASM test system that had to be redeployed on another server. The steps seemed easy enough until I ran into a problem;


SQL> select d.MOUNT_STATUS, d.MODE_STATUS, d.STATE, d.NAME, d.PATH
from v$asm_disk d, v$asm_diskgroup dg
where d.GROUP_NUMBER=dg.GROUP_NUMBER
and dg.name = 'PCASDGF'
order by 4
/

MOUNT_S MODE_ST STATE NAME PATH HEADER_STATUS
------- ------- -------- -------------------- --------------------------- -------------
...
CACHED ONLINE NORMAL PCASDGF_0132 /dev/oracle/fra/c17t6d7 MEMBER
CACHED ONLINE NORMAL PCASDGF_0133 /dev/oracle/fra/c17t7d0 MEMBER
CACHED ONLINE NORMAL PCASDGF_0134 /dev/oracle/fra/c17t7d1 MEMBER
CACHED ONLINE NORMAL PCASDGF_0135 /dev/oracle/fra/c17t7d2 MEMBER
CACHED ONLINE NORMAL PCASDGF_0136 /dev/oracle/fra/c17t7d3 MEMBER
CACHED ONLINE NORMAL PCASDGF_0137 /dev/oracle/fra/c17t7d4 MEMBER
CACHED ONLINE NORMAL PCASDGF_0138 /dev/oracle/fra/c17t7d5 MEMBER
CACHED ONLINE NORMAL PCASDGF_0139 /dev/oracle/fra/c17t7d6 MEMBER
CACHED ONLINE NORMAL PCASDGF_0140 /dev/oracle/fra/c17t7d7 MEMBER
CACHED ONLINE NORMAL PCASDGF_0141 /dev/oracle/fra/c17t8d5 MEMBER

SQL> alter diskgroup PCASDGF drop disk PCASDGF_0132;
SQL> alter diskgroup PCASDGF drop disk PCASDGF_0133;
SQL> alter diskgroup PCASDGF drop disk PCASDGF_0134;
SQL> alter diskgroup PCASDGF drop disk PCASDGF_0135;
SQL> alter diskgroup PCASDGF drop disk PCASDGF_0136;
SQL> alter diskgroup PCASDGF drop disk PCASDGF_0137;
SQL> alter diskgroup PCASDGF drop disk PCASDGF_0138;
SQL> alter diskgroup PCASDGF drop disk PCASDGF_0139;
SQL> alter diskgroup PCASDGF drop disk PCASDGF_0140;
SQL> alter diskgroup PCASDGF drop disk PCASDGF_0141;


You can happily drop disks in a disk group and ASM will seamlessly migrate the data to the existing disks in the disk group. The prompt returns immediatly, but the job (of migrating the data) is not yet done. In order to monitor progress use the following SQL.


SQL> select * from v$asm_operation
/


When the job is done the SQL will retun no rows. The status of the disks are also updated.


SQL> select MOUNT_STATUS, MODE_STATUS, STATE, NAME, PATH, header_status
from v$asm_disk
where name is null
/
MOUNT_S MODE_ST STATE NAME PATH HEADER_STATU
------- ------- -------- ------------------------------ ------------------------------ ------------
CLOSED ONLINE NORMAL /dev/oracle/fra/c17t6d7 FORMER
CLOSED ONLINE NORMAL /dev/oracle/fra/c17t7d0 FORMER
CLOSED ONLINE NORMAL /dev/oracle/fra/c17t7d1 FORMER
CLOSED ONLINE NORMAL /dev/oracle/fra/c17t7d2 FORMER
CLOSED ONLINE NORMAL /dev/oracle/fra/c17t7d3 FORMER
CLOSED ONLINE NORMAL /dev/oracle/fra/c17t7d4 FORMER
CLOSED ONLINE NORMAL /dev/oracle/fra/c17t7d5 FORMER
CLOSED ONLINE NORMAL /dev/oracle/fra/c17t7d6 FORMER
CLOSED ONLINE NORMAL /dev/oracle/fra/c17t7d7 FORMER
CLOSED ONLINE NORMAL /dev/oracle/fra/c17t8d0 FORMER

BUT

# fuser /dev/oracle/fra/c17t6d7 returns/dev/oracle/fra/c17t6d7: 2924o 18184o 18175o 4129o 618o

The process details are

oracle 2924 1 0 Nov 4 ? 17:53 asm_rbal_+ASM
oracle 18184 1 0 Nov 17 ? 1:16 ora_rbal_casbcva
oracle 18175 1 0 Nov 17 ? 1:34 ora_rvwr_casbcva
oracle 4129 4128 0 10:49:06 ? 0:00 oracle+ASM (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oracle 618 1 0 Nov 25 ? 37:06 oraclecasbcva (LOCAL=NO)

Is it safe to physically remove these disks? +ASM still knows about these disks and Oracle processes are still attached to these disks. I logged a SR with Oracle Support and got the following feedback:

Bug 7516653 - DROPPED ASM DISK STILL HELD BY ASM PROCESSES
Closed as duplicate for bug:
Bug 7225720 - ASM DOES NOT CLOSE OPEN DESCRIPTORS EVEN AFTER APPLYING THE Patch 4693355

Fixed in 11.2

Please perform the following workaround:
1. create dummy diskgroup using this disk:
SQL> create diskgroup test external redundancy disk ;
2. drop this diskgroup:
SQL> drop diskgroup test;
3. check if the disk still help by any process.
If the disk still held then we will have to restart the ASM instance.

So that's what I did


SQL> create diskgroup test
external redundancy
disk /dev/oracle/fra/c17t6d7','/dev/oracle/fra/c17t7d0',
'/dev/oracle/fra/c17t7d1','/dev/oracle/fra/c17t7d2',
'/dev/oracle/fra/c17t7d3','/dev/oracle/fra/c17t7d4',
'/dev/oracle/fra/c17t7d5','/dev/oracle/fra/c17t7d6',
'/dev/oracle/fra/c17t7d7','/dev/oracle/fra/c17t8d0'
/
SQL> drop diskgroup test
/


Fuser still showed os processes accesing the devices and I eventually had to bounce +ASM before the devices could be safely removed.


Wednesday, December 10, 2008

SQL workload history with DBA_HIST_SQLSTAT

Using the DBA_HIST_XXX views from the AWR (automatic workload repository), it has become easier for a DBA to track changes in workload metrics over time. Through a SR I logged with Oracle Support they have supplied me with the following sql statement to track such changes for a single SQL statement.

Just substitute XXX with your SQL_ID.


set trimspool on
set lines 220
set long 10000
set longchunk 10000
set pages 99
set longchunk 100000
set long 10000
set time on
undefine SQL_ID
undefine PLAN_HASH_VALUE
undefine SQL_TEXT

Select distinct dbid, sql_id, plan_hash_value, timestamp
from dba_hist_sql_plan where sql_id='XXX'
order by dbid, timestamp
/

undefine PLAN_HASH_VALUE
accept PLAN_HASH_VALUE prompt 'Please enter PLAN_HASH_VALUE to show Statistics for: '
col iowait_delta format 9999999.99 heading iowaitdelta(ms)
col iowait_total format 9999999.99 heading iowaittotal(ms)
col ELAPSED_TIME_TOTAL format 9999999.99 heading elapsdtimetotal(ms)
col ELAPSED_TIME_DELTA format 9999999.99 heading elapsdtimedelta(ms)
col PLAN_HASH_VALUE heading plan_hashvalue
col CONCURRENCY_WAIT_TOTAL format 9999999.99 heading concwaittotal(ms)
col CONCURRENCY_WAIT_delta format 9999999.99 heading concwaitdelta(ms)
col CLUSTER_WAIT_DELTA format 9999999.99 heading clustwaitdelta(ms)
col CLUSTER_WAIT_TOTAL format 9999999.99 heading clustwaittotal(ms)
col APWAIT_TOTAL format 9999 heading applwaittimetotal(micro)
col APWAIT_DELTA format 9999 heading applwaittimedelta(micro)
col PLSEXEC_TIME_TOTAL format 9999 heading plsqlexectimetotal(micro)
col PLSEXEC_TIME_DELTA format 9999 heading plsqlexectimedelta(micro)
col JAVAEXEC_TIME_DELTA format 9999 heading javaexectimedelta(micro)
col JAVAEXEC_TIME_TOTAL format 9999 heading javaexectimetotal(micro)
col optimizer_cost format 9999 heading optcostcol optimizer_mode format a10 heading optimmode
col kept_versions format 999 heading keptvers
col invalidations_total format 999 heading invalidtot
col invalidations_delta format 999 heading invaliddlt
col parse_calls_total format 99999 heading parsecallstotal
col parse_calls_delta format 99999 heading parsecallsdelta
col executions_total format 999999 heading exectotal
col executions_delta format 999999 heading execdelta
col fetches_total format 9999999 heading fetchestotal
col fetches_delta format 9999999 heading fetchesdelta
col end_of_fetch_count_total format 9999 heading endoffetchcalltotal
col end_of_fetch_count_delta format 9999 heading endoffetchcalldelta
col buffer_gets_total format 99999999 heading buffergetstotal
col buffer_gets_delta format 99999999 heading buffergetsdelta
col disk_reads_total format 999999 heading diskreadstotal
col disk_reads_delta format 9999999 heading diskreadsdelta
col rows_processed_total format 9999999 heading rowsprocessedtotal
col rows_processed_delta format 9999999 heading rowsprocesseddelta
col rows_ex format 999999 heading rowsexeccol snap_id format 99999 heading snapid
col ela_ex format 9999999.99 heading elapsedperexecution
col cwt_ex format 9999999.99 heading cwtperexecution
col instance_number format 99 heading inID

select sql_id, plan_hash_value,dba_hist_sqlstat.snap_id,
to_char(dba_hist_snapshot.BEGIN_INTERVAL_TIME,'dd-mm_hh24:mi') snap_beg,dba_hist_sqlstat.instance_number,invalidations_delta,
parse_calls_delta,executions_delta,fetches_delta,buffer_gets_delta,
disk_reads_delta,rows_processed_delta,elapsed_time_delta/1000 elapsed_time_delta,iowait_delta/1000 iowait_delta,clwait_delta/1000 cluster_wait_delta,ccwait_delta/1000 concurrency_wait_delta,optimizer_mode, optimizer_cost,
substr(optimizer_mode,1,3) opt,
case when executions_delta = 0 then NULL
when rows_processed_delta = 0 then NULL
else(rows_processed_delta/executions_delta)end rows_ex,
case when executions_delta = 0 then NULL
when clwait_delta = 0 then NULL
else(clwait_delta/executions_delta)/1000 end cwt_ex,
case when executions_delta = 0 then NULL
when elapsed_time_delta = 0 then NULL
else(elapsed_time_delta/executions_delta)/1000 end ela_ex
from dba_hist_sqlstat, dba_hist_snapshot
where sql_id='XXX'
and plan_hash_value='&PLAN_HASH_VALUE'
and dba_hist_sqlstat.snap_id=dba_hist_snapshot.snap_id
and dba_hist_sqlstat.instance_number=dba_hist_snapshot.instance_number
order by dba_hist_sqlstat.instance_number,dba_hist_sqlstat.snap_id
/



The view DBA_HIST_SQLSTAT displays historical information about SQL statistics. Each statistic is stored in two separate columns:
metric_TOTAL for the total value of the statistic since instance startup.
metic_DELTA for the change in a statistic’s value between BEGIN_INTERVAL_TIME to the END_INTERVAL_TIME that is stored in the DBA_HIST_SNAPSHOT view.

You can also query DBA_HIST_SQL_PLAN to compare the execution plans, if PLAN_HASH_VALUE has changed .



Friday, December 5, 2008

Delete archive logs from the standby server

I implemented the following script to delete archive logs from a standby server.
You cannot just delete the archive log files because that will leave entries in the V$ARCHIVED_LOG table.


#!/bin/ksh
#**************************************************************
# Here is where we loop through each SID in /etc/oratab . . .
#**************************************************************
cat /etc/oratab grep '^c.*:.*:[Yy]$' {
while IFS=":" read ORACLE_SID ORACLE_HOME junk
do
PATH=${ORACLE_HOME}/bin:$PATH
export ORACLE_SID ORACLE_HOME PATH

# Now run the SQL script to generate a rm script
sqlplus -s /nolog <<EOF
connect / as sysdba
set heading off feedback off timing off
spool /usr/users/oracle/cronscripts/logs/rm_arch_${ORACLE_SID}.ksh
select 'rm '||NAME from v\$archived_log
where REGISTRAR='RFS'
and APPLIED='YES'
and DELETED='NO'
and COMPLETION_TIME < (SYSDATE-1);
spool off
exit
EOF


# Now run the generated rm script
chmod 740 /usr/users/oracle/cronscripts/logs/rm_arch_${ORACLE_SID}.ksh
/usr/users/oracle/cronscripts/logs/rm_arch_${ORACLE_SID}.ksh

# Now use RMAN to update V$ARCHIVED_LOG.DELETED
rman <<EOF
connect target /
crosscheck archivelog all;
delete noprompt expired archivelog all;
EOF

done
}



Notes:
columns V$ARCHIVED_LOG.REGISTRAR and APPLIED
If REGISTRAR='RFS' and APPLIEDis NO, then the log has arrived at the standby but has not yet been applied.
If REGISTRAR='RFS' and APPLIED is YES, the log has arrived and been applied at the standby database.

column V$ARCHIVED_LOG.DELETED
Indicates whether an RMAN DELETE command has physically deleted the archived log file from disk (YES) or not (NO)

RMAN EXPIRED
Removes only files whose status in the repository is EXPIRED.
RMAN marks backups and copies as expired when you run a CROSSCHECK command and the files are absent or inaccessible.
To determine which files are expired, run a LIST EXPIRED command.

RMAN NOPROMPT
Beginning in Oracle9i, RMAN's default behavior is to prompt for confirmation when you run DELETE EXPIRED.
In prior releases, RMAN did not prompt.


Fast Index Creation

I recently had to create a new index on a fairly large date ranged partitioned table. I tried a few permutations on test, but found the following steps to be the fastest;


01:01:03 SQL> alter session set workarea_size_policy=manual

Session altered.

01:13:44 SQL> alter session set hash_area_size=1073741824

Session altered.

01:13:44 SQL> alter session set sort_area_size=1073741824

Session altered.

01:13:44 SQL> select force_logging from v$database

FOR
---
YES

01:13:44 SQL> alter database no force logging

Database altered.

01:13:46 SQL>create index imei_prod.usage_profile_idx1 on imei_prod.usage_profile(msisdn)
01:13:46 2 nologging
01:13:46 3 compress
01:13:46 4 parallel 16
01:13:46 5 local
01:13:46 6 (
01:13:46 7 partition USAGE_199401 tablespace indx nologging compress,
01:13:46 8 partition USAGE_200106 tablespace indx nologging compress,
01:13:46 9 partition USAGE_200202 tablespace indx nologging compress,
01:13:46 10 partition USAGE_200207 tablespace indx nologging compress,
...
01:13:46 56 partition USAGE_200809 tablespace indx nologging compress,
01:13:46 57 partition USAGE_200810 tablespace indx nologging compress,
01:13:46 58 partition USAGE_200811 tablespace indx nologging compress
01:13:46 59 )
01:13:46 60 /

Index created

Elapsed: 01:58:57.99


03:12:44 SQL> SELECT st.sid, se.username, TO_CHAR(se.logon_time,'dd-mon-yy hh24:mi')
03:12:44 2 logtime, se.program, (value/1048576) VALUE
03:12:44 3 FROM gv$sesstat st, gv$statname sn, gv$session se
03:12:44 4 WHERE username = 'MEYERM'
03:12:44 5 and sn.name = 'redo size'
03:12:44 6 AND sn.statistic# = st.statistic#
03:12:44 7 AND st.sid = se.sid
03:12:44 8 /


Redo
Generated
SID USERNAME Logon Time PROGRAM in MB
--------------------------------------------------------------
146 MEYERM 26-nov-08 01:13 sqlplus@pxxxx1 (TNS V1-V3) 116.583

03:12:46 SQL> alter database force logging

Database altered.

03:12:46 SQL> alter index imei_prod.usage_profile_idx1 logging;

Index altered.