Tuesday, July 8, 2008

Using various tools to solve a problem

I was recently called in to take over a performance problem. I quickly determined that only a portion of the database was affected, but it was the most visible part, affecting the business ability to offer a certain feature to millions of our customers. Some queries that should take less than 5 seconds suddenly took minutes to complete.

Of course the App Vendor claimed that nothing changed around that particular query in years, so it had to be a database problem.

Looking at OEM’s Average Active Sessions graph on the Performance tab, I could clearly see a huge red stripe (red means the wait is Application related). Drilling down I found waits for an event called “enq: UL – contention” to be the problem. A metalink search retuned no hits and a google search retuned hundreds of forum websites with DBA’s asking “What is enq: UL – contention?”, most of them did not have any replies. The only reference to this event in the manual was found in Oracle® Database Reference 10g Release 2 (10.2) (B14237-01) Appendix E: Oracle Enqueue Names . It simply stated “UL, User-defined Locks”. To me that meant that the app could be using DBMS_LOCK, but the Vendor said they don’t and app support told me that these queries do not write to the database.

I queried V$SESSION for sessions waiting for EVENT=”enq: UL – contention” to find the SID of the BLOCKING_SESSION, but the BLOCKING_SESSION kept on changing. One blocking session would have confirmed an App issue and the short term solution would have been to kill that process.

Then I logged a priority 1 SR and was asked to provide system state dumps. Eventually BUG 4367986 was identified as a possibility. The patch was applied, but it made no difference!

OK, I clearly had to try something else, so I decided to trace 1 session and go through the TKPROF output. I found that the app related SQL statements all executed in less than 5 second, but one statement stood out at 60 seconds elapsed time. That SQL statement was;

begin ctxsys.drvdml.com_sync_index(:idxname, :idxmem, :partname); end;

Schema ctxsys means Oracle Text is in use and the Vendor then confirmed that Oracle Text indexes where build on an app related audit table almost 2 months ago. The TKPROF also showed INSERTs into the app audit table, so we were finally on the right track.

This information was uploaded in the SR and I had to check the SYNC intervals via a select from dr$index and confirmed that the 2 audit indexes had SYNC ON COMMIT. We changed the COMMIT interval to MANUAL , just to proof if this problem is related to the SYNC ON COMMIT. Once we changed the commit interval to MANUAL, the wait event disappeared.


The SR engineer then supplied the following script to change the SYNC to every 5 minutes.


So the “moral of the story” is that as a DBA you have more than one tool to solve a problem and you should use as many of them as possible until the root cause of the problem is clear.


Anonymous said...

Great suggestion...

to identify the current settings.. run following.


Anonymous said...

Thank you so much!
Very useful!