Friday, December 12, 2008

Remove ASM disk

One of the big selling points of ASM is the ability to reconfigure the storage online. I had to remove 10 disks from a +ASM test system that had to be redeployed on another server. The steps seemed easy enough until I ran into a problem;


SQL> select d.MOUNT_STATUS, d.MODE_STATUS, d.STATE, d.NAME, d.PATH
from v$asm_disk d, v$asm_diskgroup dg
where d.GROUP_NUMBER=dg.GROUP_NUMBER
and dg.name = 'PCASDGF'
order by 4
/

MOUNT_S MODE_ST STATE NAME PATH HEADER_STATUS
------- ------- -------- -------------------- --------------------------- -------------
...
CACHED ONLINE NORMAL PCASDGF_0132 /dev/oracle/fra/c17t6d7 MEMBER
CACHED ONLINE NORMAL PCASDGF_0133 /dev/oracle/fra/c17t7d0 MEMBER
CACHED ONLINE NORMAL PCASDGF_0134 /dev/oracle/fra/c17t7d1 MEMBER
CACHED ONLINE NORMAL PCASDGF_0135 /dev/oracle/fra/c17t7d2 MEMBER
CACHED ONLINE NORMAL PCASDGF_0136 /dev/oracle/fra/c17t7d3 MEMBER
CACHED ONLINE NORMAL PCASDGF_0137 /dev/oracle/fra/c17t7d4 MEMBER
CACHED ONLINE NORMAL PCASDGF_0138 /dev/oracle/fra/c17t7d5 MEMBER
CACHED ONLINE NORMAL PCASDGF_0139 /dev/oracle/fra/c17t7d6 MEMBER
CACHED ONLINE NORMAL PCASDGF_0140 /dev/oracle/fra/c17t7d7 MEMBER
CACHED ONLINE NORMAL PCASDGF_0141 /dev/oracle/fra/c17t8d5 MEMBER

SQL> alter diskgroup PCASDGF drop disk PCASDGF_0132;
SQL> alter diskgroup PCASDGF drop disk PCASDGF_0133;
SQL> alter diskgroup PCASDGF drop disk PCASDGF_0134;
SQL> alter diskgroup PCASDGF drop disk PCASDGF_0135;
SQL> alter diskgroup PCASDGF drop disk PCASDGF_0136;
SQL> alter diskgroup PCASDGF drop disk PCASDGF_0137;
SQL> alter diskgroup PCASDGF drop disk PCASDGF_0138;
SQL> alter diskgroup PCASDGF drop disk PCASDGF_0139;
SQL> alter diskgroup PCASDGF drop disk PCASDGF_0140;
SQL> alter diskgroup PCASDGF drop disk PCASDGF_0141;


You can happily drop disks in a disk group and ASM will seamlessly migrate the data to the existing disks in the disk group. The prompt returns immediatly, but the job (of migrating the data) is not yet done. In order to monitor progress use the following SQL.


SQL> select * from v$asm_operation
/


When the job is done the SQL will retun no rows. The status of the disks are also updated.


SQL> select MOUNT_STATUS, MODE_STATUS, STATE, NAME, PATH, header_status
from v$asm_disk
where name is null
/
MOUNT_S MODE_ST STATE NAME PATH HEADER_STATU
------- ------- -------- ------------------------------ ------------------------------ ------------
CLOSED ONLINE NORMAL /dev/oracle/fra/c17t6d7 FORMER
CLOSED ONLINE NORMAL /dev/oracle/fra/c17t7d0 FORMER
CLOSED ONLINE NORMAL /dev/oracle/fra/c17t7d1 FORMER
CLOSED ONLINE NORMAL /dev/oracle/fra/c17t7d2 FORMER
CLOSED ONLINE NORMAL /dev/oracle/fra/c17t7d3 FORMER
CLOSED ONLINE NORMAL /dev/oracle/fra/c17t7d4 FORMER
CLOSED ONLINE NORMAL /dev/oracle/fra/c17t7d5 FORMER
CLOSED ONLINE NORMAL /dev/oracle/fra/c17t7d6 FORMER
CLOSED ONLINE NORMAL /dev/oracle/fra/c17t7d7 FORMER
CLOSED ONLINE NORMAL /dev/oracle/fra/c17t8d0 FORMER

BUT

# fuser /dev/oracle/fra/c17t6d7 returns/dev/oracle/fra/c17t6d7: 2924o 18184o 18175o 4129o 618o

The process details are

oracle 2924 1 0 Nov 4 ? 17:53 asm_rbal_+ASM
oracle 18184 1 0 Nov 17 ? 1:16 ora_rbal_casbcva
oracle 18175 1 0 Nov 17 ? 1:34 ora_rvwr_casbcva
oracle 4129 4128 0 10:49:06 ? 0:00 oracle+ASM (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oracle 618 1 0 Nov 25 ? 37:06 oraclecasbcva (LOCAL=NO)

Is it safe to physically remove these disks? +ASM still knows about these disks and Oracle processes are still attached to these disks. I logged a SR with Oracle Support and got the following feedback:

Bug 7516653 - DROPPED ASM DISK STILL HELD BY ASM PROCESSES
Closed as duplicate for bug:
Bug 7225720 - ASM DOES NOT CLOSE OPEN DESCRIPTORS EVEN AFTER APPLYING THE Patch 4693355

Fixed in 11.2

Please perform the following workaround:
1. create dummy diskgroup using this disk:
SQL> create diskgroup test external redundancy disk ;
2. drop this diskgroup:
SQL> drop diskgroup test;
3. check if the disk still help by any process.
If the disk still held then we will have to restart the ASM instance.

So that's what I did


SQL> create diskgroup test
external redundancy
disk /dev/oracle/fra/c17t6d7','/dev/oracle/fra/c17t7d0',
'/dev/oracle/fra/c17t7d1','/dev/oracle/fra/c17t7d2',
'/dev/oracle/fra/c17t7d3','/dev/oracle/fra/c17t7d4',
'/dev/oracle/fra/c17t7d5','/dev/oracle/fra/c17t7d6',
'/dev/oracle/fra/c17t7d7','/dev/oracle/fra/c17t8d0'
/
SQL> drop diskgroup test
/


Fuser still showed os processes accesing the devices and I eventually had to bounce +ASM before the devices could be safely removed.


2 comments:

Anonymous said...

Presume you could still rip this storage away without doing the database any harm though. Nothing will be reading/writing to it. Its just the file descriptors being held open? what was Oracle's take on this?

Can see why you'd want to bounce instances though even it it was a safe thing to do.

Anonymous said...

8 years later the bug still occur on my 11.2.0.4.7 database.

Will wait a week to see if my deallocated disk still have fuser showing processes on it.

Says to be corrected in 11.2, It is not, too sad.