Tuesday, April 20, 2010

Long wait times for cursor: pin S and latch: library cache

On our OLTP db we sometimes experienced short bursts (5-15 seconds) of CPU starvation and waits for cursor: pin S or latch: library cache. Especially during busy times. CPU utilization is normally 65% so we were confident that the waits also cause the CPU starvation and not vice versa. In the past few months we have logged multiple SRs with Oracle for the issue but only made progress with 1 where bug Bug 6904068 - High CPU usage when there are "cursor: pin S" waits [ID 6904068.8] was identified. The problem with the other SRs was a case of wrong scope definition. Even though I would upload all the relevant information (ASH, AWR and OS Watcher stats) Oracle Support would look at the top wait in AWR (“db file sequenctial read”) and suggest that we solve that by doing some query tuning and hopefully the mutex/latch problem will also disappear. Even when I convinced them that the problem is not “db file sequential read” they would ask me to take 3x system state dumps when the problem appears. That was an impossible task as the first sytem state dump would kick in when the problem is already occurring, so we could never give support the info they claimed they needed to diagnose the problem.

We were in the fortunate position to ask Tanel Poder to have a look. He also posted some info about the problem and how he systematically solved it. And with the same info I always give to Oracle support, he could correctly identify the problem in a couple of minutes! Most amazing is that the solution is available in the Oracle Documentation. 10g Release 2 (10.2) for UNIX-Based Operating Systems. B15658-02. Appendix B Administering Oracle Database on HP-UX.

What the documentation says:
On HP-UX, most processes use a time-sharing scheduling policy. Time sharing can have detrimental effects on Oracle performance by descheduling an Oracle process during critical operations, for example, when it is holding a latch. HP-UX has a modified scheduling policy, referred to as SCHED_NOAGE, that specifically addresses this issue. Unlike the normal time-sharing policy, a process scheduled using SCHED_NOAGE does not increase or decrease in priority, nor is it preempted.
This feature is suited to online transaction processing environments because online transaction processing environments can cause competition for critical resources.


So the OS scheduler may deschedule a Oracle process while it is holding a mutex/latch because the OS is not aware of what is going on inside Oracle. Now other processes which happen to use that cursor will fail to get the mutex and start spinning. If the mutex holder process has a lower priority than all the other processes who yielded CPU, then it may not get onto CPU soon enough, so it still holds that mutex and the other processes spin and use CPU trying to get that mutex.

How Tanel proved it:
Below are queries from v$event_histogram for the top 2 problem waits. Notice the number of histogram buckets where the wait time is > 512ms:

EVENT_NAME WAIT_TIME_MILLI WAIT_COUNT WAIT%
------------------------------ --------------- -------------- ------
latch: library cache 1 815,980 52.76
latch: library cache 2 187,232 12.11
latch: library cache 4 168,402 10.89
latch: library cache 8 138,053 8.93
latch: library cache 16 102,906 6.65
latch: library cache 32 69,149 4.47
latch: library cache 64 43,797 2.83
latch: library cache 128 11,916 0.77
latch: library cache 256 3,867 0.25
latch: library cache 512 1,005 0.06
latch: library cache 1,024 1,108 0.07
latch: library cache 2,048 1,516 0.10
latch: library cache 4,096 1,293 0.08
latch: library cache 8,192 351 0.02

EVENT_NAME WAIT_TIME_MILLI WAIT_COUNT WAIT%
------------------------------ --------------- -------------- ------
cursor: pin S 1 59,309,179 86.63
cursor: pin S 2 5,988,534 8.75
cursor: pin S 4 2,313,442 3.38
cursor: pin S 8 675,114 0.99
cursor: pin S 16 113,084 0.17
cursor: pin S 32 23,804 0.03
cursor: pin S 64 15,434 0.02
cursor: pin S 128 9,643 0.01
cursor: pin S 256 4,247 0.01
cursor: pin S 512 2,912 0.00
cursor: pin S 1,024 2,871 0.00
cursor: pin S 2,048 2,626 0.00
cursor: pin S 4,096 1,980 0.00
cursor: pin S 8,192 967 0.00
cursor: pin S 16,384 278 0.00
cursor: pin S 32,768 32 0.00

ps –efl > ps.txt
cat ps.txtgrep ".:..:.."awk '{ printf "%6d %50s\n", $7, $15 }'>ps1.txt
cat ps.txtgrep -v ".:..:.."awk '{ printf "%6d %50s\n", $7, $16 }'>ps2.txt
cat ps1.txt ps2.txtgrep ora.*abcprdsort -nuniq -csort -nbrhead -20

NUM
PROCS PRIO PROCESS_NAME
----- ------ -------------------
1936 154 oracleabcprd
25 148 oracleabcprd
4 241 oracleabcprd
1 240 oracleabcprd
1 235 oracleabcprd
1 229 oracleabcprd
1 223 oracleabcprd
1 207 oracleabcprd
1 195 oracleabcprd
1 187 oracleabcprd
1 181 oracleabcprd
1 179 oracleabcprd
1 154 ora_smon_abcprd
1 154 ora_rvwr_abcprd
1 154 ora_rsm0_abcprd
1 154 ora_reco_abcprd
1 154 ora_rbal_abcprd
1 154 ora_qmnc_abcprd
1 154 ora_q004_abcprd
1 154 ora_q003_abcprd

Note the different priorities of the oracle client processes. After we enabled SCHED_NOAGE for Oracle Database as described in the documentation. The output looked like this:

EVENT_NAME WAIT_TIME_MILLI WAIT_COUNT WAIT%
------------------------------ --------------- -------------- ------
latch: library cache 1 53,212 57.39
latch: library cache 2 10,517 11.34
latch: library cache 4 11,069 11.94
latch: library cache 8 5,308 5.72
latch: library cache 16 3,715 4.01
latch: library cache 32 3,506 3.78
latch: library cache 64 5,254 5.67
latch: library cache 128 131 0.14
latch: library cache 256 5 0.01
latch: library cache 512 7 0.01
latch: library cache 1,024 3 0.00

EVENT_NAME WAIT_TIME_MILLI WAIT_COUNT WAIT%
------------------------------ --------------- -------------- ------
cursor: pin S 1 2,267 15.46
cursor: pin S 2 0 0.00
cursor: pin S 4 0 0.00
cursor: pin S 8 0 0.00
cursor: pin S 16 0 0.00
cursor: pin S 32 0 0.00
cursor: pin S 64 0 0.00
cursor: pin S 128 12,380 84.45
cursor: pin S 256 12 0.08

NUM
PROCS PRIO PROCESS_NAME
----- ------ -------------------
1992 178 oracleabcprd
1 178 ora_smon_abcprd
1 178 ora_rvwr_abcprd
1 178 ora_rsm0_abcprd
1 178 ora_reco_abcprd
1 178 ora_rbal_abcprd
1 178 ora_qmnc_abcprd
1 178 ora_q001_abcprd
1 178 ora_q000_abcprd
1 178 ora_psp0_abcprd
1 178 ora_pmon_abcprd
1 178 ora_p009_abcprd
1 178 ora_p008_abcprd
1 178 ora_p007_abcprd
1 178 ora_p006_abcprd
1 178 ora_p005_abcprd
1 178 ora_p004_abcprd
1 178 ora_p003_abcprd
1 178 ora_p002_abcprd

Note that the number of wait buckets are less. The wait times are sort as it should be. And the process priority report shows that all oracle client processes are on the same priority.

Thanks Tanel!