Tuesday, April 20, 2010

Long wait times for cursor: pin S and latch: library cache

On our OLTP db we sometimes experienced short bursts (5-15 seconds) of CPU starvation and waits for cursor: pin S or latch: library cache. Especially during busy times. CPU utilization is normally 65% so we were confident that the waits also cause the CPU starvation and not vice versa. In the past few months we have logged multiple SRs with Oracle for the issue but only made progress with 1 where bug Bug 6904068 - High CPU usage when there are "cursor: pin S" waits [ID 6904068.8] was identified. The problem with the other SRs was a case of wrong scope definition. Even though I would upload all the relevant information (ASH, AWR and OS Watcher stats) Oracle Support would look at the top wait in AWR (“db file sequenctial read”) and suggest that we solve that by doing some query tuning and hopefully the mutex/latch problem will also disappear. Even when I convinced them that the problem is not “db file sequential read” they would ask me to take 3x system state dumps when the problem appears. That was an impossible task as the first sytem state dump would kick in when the problem is already occurring, so we could never give support the info they claimed they needed to diagnose the problem.

We were in the fortunate position to ask Tanel Poder to have a look. He also posted some info about the problem and how he systematically solved it. And with the same info I always give to Oracle support, he could correctly identify the problem in a couple of minutes! Most amazing is that the solution is available in the Oracle Documentation. 10g Release 2 (10.2) for UNIX-Based Operating Systems. B15658-02. Appendix B Administering Oracle Database on HP-UX.

What the documentation says:
On HP-UX, most processes use a time-sharing scheduling policy. Time sharing can have detrimental effects on Oracle performance by descheduling an Oracle process during critical operations, for example, when it is holding a latch. HP-UX has a modified scheduling policy, referred to as SCHED_NOAGE, that specifically addresses this issue. Unlike the normal time-sharing policy, a process scheduled using SCHED_NOAGE does not increase or decrease in priority, nor is it preempted.
This feature is suited to online transaction processing environments because online transaction processing environments can cause competition for critical resources.


So the OS scheduler may deschedule a Oracle process while it is holding a mutex/latch because the OS is not aware of what is going on inside Oracle. Now other processes which happen to use that cursor will fail to get the mutex and start spinning. If the mutex holder process has a lower priority than all the other processes who yielded CPU, then it may not get onto CPU soon enough, so it still holds that mutex and the other processes spin and use CPU trying to get that mutex.

How Tanel proved it:
Below are queries from v$event_histogram for the top 2 problem waits. Notice the number of histogram buckets where the wait time is > 512ms:

EVENT_NAME WAIT_TIME_MILLI WAIT_COUNT WAIT%
------------------------------ --------------- -------------- ------
latch: library cache 1 815,980 52.76
latch: library cache 2 187,232 12.11
latch: library cache 4 168,402 10.89
latch: library cache 8 138,053 8.93
latch: library cache 16 102,906 6.65
latch: library cache 32 69,149 4.47
latch: library cache 64 43,797 2.83
latch: library cache 128 11,916 0.77
latch: library cache 256 3,867 0.25
latch: library cache 512 1,005 0.06
latch: library cache 1,024 1,108 0.07
latch: library cache 2,048 1,516 0.10
latch: library cache 4,096 1,293 0.08
latch: library cache 8,192 351 0.02

EVENT_NAME WAIT_TIME_MILLI WAIT_COUNT WAIT%
------------------------------ --------------- -------------- ------
cursor: pin S 1 59,309,179 86.63
cursor: pin S 2 5,988,534 8.75
cursor: pin S 4 2,313,442 3.38
cursor: pin S 8 675,114 0.99
cursor: pin S 16 113,084 0.17
cursor: pin S 32 23,804 0.03
cursor: pin S 64 15,434 0.02
cursor: pin S 128 9,643 0.01
cursor: pin S 256 4,247 0.01
cursor: pin S 512 2,912 0.00
cursor: pin S 1,024 2,871 0.00
cursor: pin S 2,048 2,626 0.00
cursor: pin S 4,096 1,980 0.00
cursor: pin S 8,192 967 0.00
cursor: pin S 16,384 278 0.00
cursor: pin S 32,768 32 0.00

ps –efl > ps.txt
cat ps.txtgrep ".:..:.."awk '{ printf "%6d %50s\n", $7, $15 }'>ps1.txt
cat ps.txtgrep -v ".:..:.."awk '{ printf "%6d %50s\n", $7, $16 }'>ps2.txt
cat ps1.txt ps2.txtgrep ora.*abcprdsort -nuniq -csort -nbrhead -20

NUM
PROCS PRIO PROCESS_NAME
----- ------ -------------------
1936 154 oracleabcprd
25 148 oracleabcprd
4 241 oracleabcprd
1 240 oracleabcprd
1 235 oracleabcprd
1 229 oracleabcprd
1 223 oracleabcprd
1 207 oracleabcprd
1 195 oracleabcprd
1 187 oracleabcprd
1 181 oracleabcprd
1 179 oracleabcprd
1 154 ora_smon_abcprd
1 154 ora_rvwr_abcprd
1 154 ora_rsm0_abcprd
1 154 ora_reco_abcprd
1 154 ora_rbal_abcprd
1 154 ora_qmnc_abcprd
1 154 ora_q004_abcprd
1 154 ora_q003_abcprd

Note the different priorities of the oracle client processes. After we enabled SCHED_NOAGE for Oracle Database as described in the documentation. The output looked like this:

EVENT_NAME WAIT_TIME_MILLI WAIT_COUNT WAIT%
------------------------------ --------------- -------------- ------
latch: library cache 1 53,212 57.39
latch: library cache 2 10,517 11.34
latch: library cache 4 11,069 11.94
latch: library cache 8 5,308 5.72
latch: library cache 16 3,715 4.01
latch: library cache 32 3,506 3.78
latch: library cache 64 5,254 5.67
latch: library cache 128 131 0.14
latch: library cache 256 5 0.01
latch: library cache 512 7 0.01
latch: library cache 1,024 3 0.00

EVENT_NAME WAIT_TIME_MILLI WAIT_COUNT WAIT%
------------------------------ --------------- -------------- ------
cursor: pin S 1 2,267 15.46
cursor: pin S 2 0 0.00
cursor: pin S 4 0 0.00
cursor: pin S 8 0 0.00
cursor: pin S 16 0 0.00
cursor: pin S 32 0 0.00
cursor: pin S 64 0 0.00
cursor: pin S 128 12,380 84.45
cursor: pin S 256 12 0.08

NUM
PROCS PRIO PROCESS_NAME
----- ------ -------------------
1992 178 oracleabcprd
1 178 ora_smon_abcprd
1 178 ora_rvwr_abcprd
1 178 ora_rsm0_abcprd
1 178 ora_reco_abcprd
1 178 ora_rbal_abcprd
1 178 ora_qmnc_abcprd
1 178 ora_q001_abcprd
1 178 ora_q000_abcprd
1 178 ora_psp0_abcprd
1 178 ora_pmon_abcprd
1 178 ora_p009_abcprd
1 178 ora_p008_abcprd
1 178 ora_p007_abcprd
1 178 ora_p006_abcprd
1 178 ora_p005_abcprd
1 178 ora_p004_abcprd
1 178 ora_p003_abcprd
1 178 ora_p002_abcprd

Note that the number of wait buckets are less. The wait times are sort as it should be. And the process priority report shows that all oracle client processes are on the same priority.

Thanks Tanel!

7 comments:

The Human Fly said...

Martin,

Congratulations on resolving the mysterious problem and kudos to Mr. Tanel for his excellent problem solving skills. By the way, on 10.2.0.4 the parameter is set to 178 by default.

http://forums11.itrc.hp.com/service/forums/questionanswer.do?admit=109447626+1272093505647+28353475&threadId=37528

Unknown said...

Thanks. We noticed that it is a default value, so only the HP-UX change had to be made.

Anonymous said...

We are seeing similar problem - High CPU utilization on AIX 5.3. AWR report shows high latch causing most of the wait time. Is this a problem on AIX 5.3? Thanks so much.
Christian.

Sam said...

hello,
we are seeing same kind of waits on our 11g R1 DB running on RHEL4_64bit.
is there any parameter like SCHED_NOAGE for Linux too ?

its little urgent

Anonymous said...

oh, it seems to me a kind of a classical OS/scheduling problem about priority inversion: the OS doesn't know that oracle is in a critical section because ... oracle doesn't tell anything to the OS.

Ehm ... I know of a problem happened to me in the call of readOCR.

If the RPC call to the cluster hangs/spin (a problem, of course) the CALLER spins in CPU waiting for the answer ... forever, this second problem (do not return ORA-0fkkff on timeout) make the event undiagnosable.

as far as I know, it is present in the current releases (2013).

best regards.

Anonymous said...

I am facing the same error on Oracle Database 10.2 and OS Version UX 11.11 . Where to set the parameter?

Anonymous said...

We had the problem with readOCR, called by a LOBWrite in a RAC environment.
This RPC hanged and the caller spinned in CPU for ages.

I've tried to explain to MOS that the problems where TWO and not just ONE (the caller not timing out with an ORA-, and the bug into the CRS deamon): FAILED!

This problem, present for SURE in the actual releases, IS NOT related to the mutex problem "pin S" in this article.

Best Regards