Bugs

LMON: terminating instance due to error 481 after 10.2.0.4 Upgrade

I have recently come across this nasty bug, which occurred after upgrading the RAC cluster to 10.2.0.4. Out of nowhere the LMON process has terminated one cluster instance. After some research I found a matching bug from September 2008:

Bug 6500033 / MetaLink Note 6500033.8: LMON crash the instance with ORA-481 due to DRM sync timeout

alert_MDDB1.log:

Sun Oct 19 17:47:12 2008
LMON: terminating instance due to error 481

MDDB1_lmon_12345.trc:

-----------------------
sent synca inc 4 lvl 1308 (4,0/38/0)
kjfcdrmrfg: SYNC TIMEOUT (105441, 104540, 900), step 31
Submitting asynchronized dump request [28]
KJC Communication Dump:
state 0x5 flags 0x0 mode 0x0 inst 0 inc 4
nrcv 3 nsp 3 nrcvbuf 1000

It is fixed in RAC Recommended Bundle #2 (7573282), available for Linux x86-32/64, Solaris and HP-UX.



ORA-600 – be careful when dynamically resizing SGA structures

If you have Automatic Memory Management turned on (sga_target > 0) and you dynamically resize db_cache_size, you might – like me – encounter an instance crash with ORA-600 [Kmgs_pre_process_request_2]. Small change – big impact! Oracle tracks this problem with Bug 6597948 and Bug 5942310. There is a patch for HP-UX Itanium 10.2.0.3. It will be fixed in 10.2.0.5 and 11g. MetaLink Note 737458.1 gives more details.

An even more obscure problem occurs when sga_target is set to an exact multiple of 4 GB: Ora-600 [Kmgs_Pre_Process_Request_6] Terminates Instance When Resizing Caches. See MetaLink 373802.1.



Wait Event “cursor: pin S wait on X” suspected to be related with Automatic Memory Management

Quite a while ago, I experienced severe performance problems while processing peak workload with multiple sessions, all waiting on "pin S wait on X". Somehow I suspected that it might have to do with frequent Automatic Memory Management Resize operations. Therefore, we have disabled it with sga_target=0. After that, we did not experience these issues anymore.

As a responsible DBA I have opened a service request with Oracle and asked:

When looking for the wait event CURSOR: PIN S WAIT ON X on metalink, there are quite a few relations to the automatic shared memory management. Could you investigate, whether disabling ASMM would benefit regarding to these mutex waits?

Oracle replied:

Based on the uploaded systemstate and stacks it is no connection between the mutex waits and the ASMM.

Today, I have read a metalink note: 742599.1 – FREQUENT RESIZE OF SGA, which confirms my suspicion. The bug is tracked as Bug 6528336 – LARGE NUMBER OF SESSIONS WAITING ON CURSOR: PIN S WAIT ON X. There is a patch available for HP-UX Itanium on 10.2.0.3 but I would rather disable Automatic Memory Management and wait for it to be fixed with a later – and better tested – patchset.

Support however recommended to get systemstate level 266 dumps as soon as the problem occurs again and then look for the holder of the mutex. An example is given on MetaLink Note: 423153.1.

The correlation is done after th IDN as bellow:

To find more details use the idn=XXXXXX to search down in the systemstate (ie idn=535d1a6c)

KGX Atomic Operation Log 7000002e5b9d160
Mutex 7000002b8e92268(3094, 0) idn 535d1a6c oper GET_SHRD SID 3094 holds it
Cursor Pin uid 2489 efd 0 whr 5 slp 58733 SID 2489 wants it in Shared (GET_SHRD)
opr=2 pso=70000028c47def0 flg=0
pcs=7000002b8e92268 nxt=0 flg=34 cld=3 hd=70000030d6c6eb0 par=7000002eefe64d0
ct=31 hsh=0 unp=0 unn=0 hvl=b825a4d0 nhv=1 ses=700000309b42600
hep=7000002b8e922e8 flg=80 ld=1 ob=7000002de49f8a0 ptr=70000022cf39db8 fex=70000022cf390c8

To find the HOLDER, search for idn XXXXXXX oper until you find one which is held (ie not GET_XXX)(
ie idn 535d1a6c oper):-

KGX Atomic Operation Log 7000002cd934270
Mutex 7000002b8e92268(3094, 0) idn 535d1a6c oper EXCL SID 3094 holds in Exclusive (EXCL)
Cursor Pin uid 3094 efd 0 whr 7 slp 0
opr=3 pso=7000002a71c4180 flg=0
pcs=7000002b8e92268 nxt=0 flg=34 cld=3 hd=70000030d6c6eb0 par=7000002eefe64d0
ct=31 hsh=0 unp=0 unn=0 hvl=b825a4d0 nhv=1 ses=700000309b42600
hep=7000002b8e922e8 flg=80 ld=1 ob=7000002de49f8a0 ptr=70000022cf39db8 fex=70000022cf390c8

The wait event "cursor: pin S wait on X" is also related to two different bugs:

  • Bug 5485914 – MUTEX REPORTED SELF DEADLOCK AFTER
    DBMS_MONITOR.SESSION_TRACE_ENABLE
  • Bug 5907779 – Self deadlock hang on “cursor: pin S wait on X” (typically from DBMS_STATS). An excellent way to diagnose if you suffer from this bug is given here.


ORA-00322, ORA-00312 at DataGuard Standby

From time to time I have encountered the following errors on the physical dataguard standby database while in recovery mode:

Errors in file /oracle/STBDB1/oratrace/bdump/stbdb1_mrp0_26719.trc:
ORA-00322: log 5 of thread 1 is not current copy
ORA-00312: online log 5 thread 1: '/oracle/STBDB1/origlogB/standby_g5_m1.log'
Sun Oct 16 13:04:08 2008
Errors in file /oracle/STBDB1/oratrace/bdump/stbdb1_mrp0_26719.trc:
ORA-00322: log 5 of thread 1 is not current copy
ORA-00312: online log 5 thread 1: '/oracle/STBDB1/origlogB/standby_g5_m1.log'

The problem corrects itself after a couple of minutes, so there is no real problem, but if you have alert log monitoring active, this will trigger a call for investigation. Oracle is tracking this error with Bug 5238386 – ORA-322 possible reading standby redo log header. There is a one-off patch available for 10.2.0.3 and the patch is included in patchset 10.2.0.4.

There is a workaround to clear the referenced redo log but I donĀ“t see any point in doing that, because the error can occurr again as long as the patch is not installed.



Session terminates when querying v$sql_plan

If your Oracle sessionĀ  terminates when you query v$sql_plan, you are potentially hitting Bug 5933643, which is fixed in 10.2.0.4. A workaround is to disable “_cursor_plan_unparse_enabled”. A MetaLink Note is available for further information: 361342.1 – Dump In msqsub() When Querying V$SQL_PLAN