11g

Oracle Enterprise Manager – Grid Control 10.2.0.5 finally released

Finally, Oracle Enterprise Manager – Grid Control 10.2.0.5 is available for download for Win32 and Linux x86. Moreover, this is the first release to officially support 11.1.0.7 as repository database as well as the first release to officially support RHEL5 for OMS.

Relevant documents for upgrade are available on MetaLink:

  • 763307.1: How to Install the 10.2.0.5.0 Grid Control Patchset on a Well Maintained OMS or Agent
  • List of Bugs Fixed: http://download.oracle.com/docs/cd/B16240_01/doc/doc.102/e14226/toc.htm
  • 763351.1: Documentation Reference for Grid Control 10.2.0.5.0 Installation and Upgrade
  • 464674.1: Checklist for EM 10g Grid Control 10.2.x to 10.2.0.4/10.2.0.5 OMS and Repository Upgrades

To be continued….



Huge Space Consumption by $ORACLE_HOME/.patch_storage

If you are keeping your system up to date with Patchsets, Patch Bundles, Merge Label Requests (MLR) or Critical Patch Updates (CPU), you will most likely suffer from a huge .patch_storage Subdirectory in your $ORACLE_HOME.

On one of my databases it looked like this:

Space used by $ORACLE_HOME: 7 GB
Space used by $ORACLE_HOME/.patch_storage: 4.3 GB

Can data in this directory be removed?

MetaLink Note 550522.1 (Subject: How To Avoid Disk Full Issues Because OPatch Backups Take Big Amount Of Disk Space.) has the answer and tells you: it depends. Normally, this data is used in order to be able to rollback a patch. However, if you have installed a patchset (eg. 10.2.0.4), then the patches for the previous patchset (10.2.0.3) which are located in the .patch_storage directory are not needed anymore and can be removed. However, I would not recommend that you delete the directories manually yourself, as this would not be supported. Instead let Oracle do it for you:

Recent versions of opatch (current is 10.2.0.4.5 as of January 2009) have a utility included, which removes patches not needed anymore from the .patch_storage directory. Moreover, the opatch utility creates these .patch_storage backup directories more intelligently which should result in less space wasted.

[oracle@vmhost1 ora10]$./OPatch/opatch util Cleanup
Invoking OPatch 10.2.0.4.5

Oracle Interim Patch Installer version 10.2.0.4.5
Copyright (c) 2008, Oracle Corporation. All rights reserved.

UTIL session

Oracle Home : /oracle/ora10
Central Inventory : /oracle/oraInventory
from : /var/opt/oracle/oraInst.loc
OPatch version : 10.2.0.4.5
OUI version : 10.2.0.4.0
OUI location : /oracle/ora10/oui
Log file location : /oracle/ora10/cfgtoollogs/opatch/opatch2009-01-15_17-00-51PM.log

Patch history file: /oracle/ora10/cfgtoollogs/opatch/opatch_history.txt

Invoking utility “cleanup”
OPatch will clean up ‘restore.sh,make.txt’ files and ‘rac,scratch,backup’ directories.
You will be still able to rollback patches after this cleanup.
Do you want to proceed? [y|n]
y
User Responded with: Y
Size of directory “/oracle/ora10/.patch_storage” before cleanup is 4575330012 bytes.
Size of directory “/oracle/ora10/.patch_storage” after cleanup is 188326505 bytes.

UtilSession: Backup area for restore has been cleaned up. For a complete list of files/directories
deleted, Please refer log file.

OPatch succeeded.

180 MB instead of 4 GB. I like that.



Hugepages revisited II: Be aware of kernel bugs!

It is well known that hugepages can reduce the overhead of managing memory pages of Oracle SGA by the operating system thus leading to lower system cpu utilization. I have written two blog entries regarding this topic already: Listener Coredumps on heavy load system and Hugepages revisited.

However, there is a potential risk with it: Certain kernels / platforms have bugs regarding hugepages which can lead to problems:

  • Bug 131295 – Hugepages configured on kernel boot line causes x86_64 kernel boot to fail with OOM: Fixed in RHEL3: kernel-2.4.21-40.EL
  • Bug 248954 – Oracle ASM DBWR process goes into 100% CPU spin when using hugepages on ia64 (Fixed in kernel-2.6.9-78.EL.ia64.rpm available as update for RHEL4U7)
  • RHSA-2008:1017-14: on the Itanium® architecture, setting the “vm.nr_hugepages” sysctl parameter caused a kernel stack overflow resulting in a kernel panic, and possibly stack corruption. With this fix, setting vm.nr_hugepages works correctly. Fixed with RHEL5 kernel-2.6.18-92.1.22.el5.ia64.rpm
  • RHSA-2008:1017-14: hugepages allow the Linux kernel to utilize the multiple page size capabilities of modern hardware architectures. In certain configurations, systems with large amounts of memory could fail to allocate most of this memory for hugepages even if it was free. This could result, for example, in database restart failures. Fixed with RHEL5 kernel-2.6.18-92.1.22.el5.ia64.rpm

Therefore, before enabling hugepages, I recommend to check with your OS Vendor Bug Database, test on a test system and apply recent OS upgrades first.



Installation Prerequisites for Oracle Databases

More often than not, I see installations, where either shared memory parameters or ulimit settings are not set as required for an optimal Oracle database installation.

The tool RDA, commonly known as a support tool to gather diagnostic information for Oracle Support for service requests, can do these checks quite nicely.

Steps:

  • Download RDA from MetaLink Note 314422.1.
  • unzip in working directory
  • run Health Check Validation Engine of RDA
  • $ ./rda.pl -T hcve
    Processing HCVE tests ...
    Available Pre-Installation Rule Sets:
    1. Oracle Database 10g (10.1.0) PreInstall(HP-UX)
    2. Oracle Database 10g R1 (10.1.0) PreInstall (HP-UX Itanium)
    3. Oracle Database 10g R2 (10.2.0) PreInstall (HPUX)
    4. Oracle Database 11g R1 (11.1.0) PreInstall (HPUX)
    5. Oracle Application Server 10g (9.0.4) PreInstall (HP-UX)
    6. Oracle Application Server 10g R2 (10.1.2) PreInstall (HP-UX)
    7. Oracle Portal PreInstall (Generic)
    Available Post-Installation Rule Sets:
    8. Oracle Portal PostInstall (generic)
    9. Data Guard PostInstall (Generic)
    Enter the HCVE rule set number
    Hit 'Return' to accept the default (1)
    > 3

    Enter value for < Planned ORACLE_HOME location or if set >
    Hit 'Return' to accept the default ($ORACLE_HOME)
    >

    Test "Oracle Database 10g R2 (10.2.0) PreInstall (HPUX)" executed at Tue Dec 23 15:26:31 2008

    Test Results
    ~~~~~~~~~~~~

    ID NAME RESULT VALUE
    ===== ==================== ====== ========================================
    10 OS Certified? PASSED Certified with 10g R2 RDBMS
    20 User in /etc/passwd? PASSED userOK
    25 Got EXTJOB User? FAILED ExtjobNotFound
    30 Group in /etc/group? PASSED GroupOK
    40 Input ORACLE_HOME RECORD $ORACLE_HOME
    50 ORACLE_HOME Valid? PASSED OHexists
    60 O_H Permissions OK? PASSED CorrectPerms
    70 Umask Set to 022? PASSED UmaskOK
    80 LDLIBRARYPATH Unset? PASSED UnSet
    90 SHLIB_PATH Unset? PASSED UnSet
    100 Other O_Hs in PATH? PASSED NotFound
    110 oraInventory Permiss FAILED oraInventoryNotOK
    120 /tmp Adequate? PASSED TempSpaceOK
    130 Swap (in MB) RECORD 15144
    140 RAM (in MB) PASSED 14334
    150 SwapToRAM OK? PASSED SwapToRAMOK
    160 Disk Space OK? FAILED OnlySpaceForOne
    170 Kernel Parameters OK FAILED [EXECUTABLE_STACK=1] too large [MAXUP..>
    175 Links and Libs OK? PASSED AllExist
    180 Got ld,nm,ar,make? PASSED ld_nm_ar_make_found
    190 ulimits OK? PASSED ulimitOK
    200 Got OS Bundles? PASSED GOLDAPPS11iandGOLDBASE11iAdequate
    210 Got OS Patches? FAILED [PHNE_31097 or its successor PHNE_324..>
    220 Other OUI Up? PASSED NoOtherOUI

    The output file contains detailed information, about why a specific check failed.

    [PHNE_31097 or its successor PHNE_32477 or its successor PHNE_33498 or its successor PHNE_35418] not installed
    [PHSS_31221 or its successor PHSS_33263 or its successor PHSS_33944] not installed
    [PHSS_30970 or its successor PHSS_33033 or its successor PHSS_35379] not installed
    [PHSS_32508 or its successor PHSS_34411 or its successor PHSS_35099] not installed
    [PHSS_32509 or its successor PHSS_34412 or its successor PHSS_35098] not installed
    [PHSS_32510 or its successor PHSS_34413 or its successor PHSS_35100] not installed

    FAILED
    [EXECUTABLE_STACK=1] too large
    [MAXUPRC=1024] too small
    [MSGMNI=50] too small
    [MSGTQL=40] too small
    [NCSIZE=9964] too small
    [NFILE=5000] too small
    [NINODE=4844] too small
    [SHMMAX=1024000000] too small
    [MAXSWAPCHUNKS=2048] too small
    [MSGMAP=42] too small == KernelOK



Installing LTOM for RAC Hanganalyze

I have recently written a short installation instruction for a customer to use LTOM for RAC hanganalyze.

  • Download von MetaLink Note 352363.1:
    https://metalink2.oracle.com/cgi-bin/cr/getfile.cgi?p_attid=352363.1:ltom
  • unzip as User Oracle in directory with sufficient space. subdir ltom will be created

tar xfv ltom411.tar

  • Install statspack
  • Installation: db account ltom/ltom will be created

Java Binary in path
which java

if not, place java in path in ~/.bash_profile:
als User oracle: add $ORACLE_HOME/jre/1.4.2/bin to PATH Variable

e.g.:
export PATH=$PATH:$ORACLE_HOME/bin:$ORACLE_HOME/jre/1.4.2/bin

Put TOM_HOME in .~/.bash_profile:
export TOM_HOME=/home/oracle/ltom/tom_base/install/../tom

Sourcen of TOM_HOME

cd /home/oracle/ltom/tom_base/install
./autoinstall.sh

Sample Output:
[oracle@ora-vm1 install]$ ./autoinstall.sh
TOM_HOME=/home/oracle/ltom/tom_base/install/../tom
rm: cannot remove `autoinstall.log’: No such file or directory

Preparing to install LTOM v4.0.0…

Checking your Oracle Environment…

ORACLE_HOME=/u01/app/oracle/product/10.2.0
ORACLE_SID=ORA102
TNS_ADMIN=
TOM_HOME=/home/oracle/ltom/tom_base/install/../tom

Is this the correct Oracle environment you wish to use for installing LTOM?
Hit Return to Accept Default:
============================> \c

Continuing the install with this environment

Checking for Java…

Java found.

Checking for top utility…
Found top utility.

Checking for vmstat utility…
Found vmstat utility.

Need to run some sql now to create TOM user
it Return to allow setup to do this for you, enter other to do this manually:
============================> \c

Tom user created successfully.
Warning…Please change tom password immediately!

Grant user Tom successful.
Create package dbmssupp successful.
Grant execute on dbmssupp successful.

Checking for statspack…

Cannot verify statspack is installed on your system.
LTOM requires statspack to be installed to work properly.

Enter your email id to get notification of hangs via email…
Hit Return to ignore hang notification via email:
============================> \c

Configuring tom_deploy.properties file…

TOM/LTOM successfully installed.

Please review any messages you received above.

REMEMBER: You must have TOM_HOME defined and java in your PATH
before you can run TOM/LTOM.

  • Configuration

Change ltom/tom_base/tom/init/hangDetect.properties

from:
#RAC_AWARE=TRUE
HANGANALYZE=A
SYSTEMSTATE=266
DELAY=30
HANGANALYZE=A
SYSTEMSTATE=266

to:
RAW_AWARE=TRUE
HANGANALYZE=A
SYSTEMSTATE=266
DELAY=30
HANGANALYZE=A
SYSTEMSTATE=266

Change tom db passwort:

SQL> alter user tom identified by secret;

Add credentials to config file:

vi tom_deploy.properties

DB_USERID=tom
:q!

./startltom.sh -p
Enter same password as in sqlplus. Password will be saved encrypted in tom_deploy.properties

  • edit tom_deploy.properties to decide which of the 3 modules should be running during background operations: hangdetect, profiler, session recorder
  • Start/Stop Scripts

vi start_ltom.sh
#!/bin/bash
. ~/.bash_profile
nohup $TOM_HOME/startltom.sh -s &

vi stop_ltom.sh
#!/bin/bash
touch $TOM_HOME/tmp/hang.stop
touch $TOM_HOME/tmp/profile.stop
touch $TOM_HOME/tmp/session.stop

chmod u+x start_ltom.sh
chmod u+x stop_ltom.sh

  • If you want  to run ltom during a specific timeframe, use cron (as user oracle)

crontab -e
45 01 * * *     /home/oracle/ltom/start_ltom.sh >/dev/null 2>/dev/null </dev/null
00 05 * * *     /home/oracle/ltom/start_ltom.sh >/dev/null 2>/dev/null </dev/null



Troublshooting Trace Files of DIAG Background Process

In trace files of the background process DIAG, I saw an interesting section that I would like to share.


*** 2008-12-09 03:46:33.280
Dump requested by process [orapid=6]
REQUEST:custom dump [2] with parameters [6][27][4][2]
Dumping process info of pid[27.31384] requested by pid[6.31160]
ORA-00494: Message 494 not found; No message file for product=RDBMS, facility=ORA; arguments: [ [CF]] [ (more than 900 seconds)] [inst
2, osid 31384]
-------------------------------------------------------------------------------
ENQUEUE [CF] HELD FOR TOO LONG
enqueue holder: 'inst 2, osid 31384'
Process 'inst 2, osid 31384' is holding an enqueue for maximum allowed time.
The process will be terminated.
Oracle Support Services triaging information: to find the root-cause, look
at the call stack of process 'inst 2, osid 31384' located below. Ask the
developer that owns the first NON-service layer in the stack to investigate.
Common service layers are enqueues (ksq), latches (ksl), library cache
pins and locks (kgl), and row cache locks (kqr).

It is very nice of diag to tell me what to do with the output. 😉 Especially:

Ask the developer that owns the first NON-service layer in the stack to investigate.



System State Dump Evaluation with ass.awk

I have found out that there is an Oracle Support utility called ass.awk, which can be used to analyze and format a System State dump. It is contained in LTOM, which can be downloaded from MetaLink.

The AWK script is called ass109.awk and can be used in this manner:

[oracle@ora-vm1 ~]$ awk -f ass109.awk mddb1_diag_12345.trc
 
System State ASS:
Starting Systemstate 1
..............................................................................
.....
Ass.Awk Version 1.0.9 - Processing mddb1_diag_12345.trc
System State 1
~~~~~~~~~~~~~~~~
1:
2: waiting FOR 'pmon timer' wait
3: LAST wait FOR 'ksdxexeotherwait'
4: waiting FOR 'rdbms ipc message' wait
5: waiting FOR 'rdbms ipc message' wait
6: waiting FOR 'ges remote message' wait
7: waiting FOR 'gcs log flush sync' wait
8: waiting FOR 'gcs log flush sync' wait
9: waiting FOR 'rdbms ipc message' wait
10: waiting FOR 'rdbms ipc message' wait
11: waiting FOR 'enq: CF - contention' [Enqueue CF-00000000-00000000] wait
12: waiting FOR 'rdbms ipc message' wait
13: waiting FOR 'buffer busy waits' (2,a9,3b) wait
14: waiting FOR 'rdbms ipc message' wait
15: waiting FOR 'rdbms ipc message' wait
16: waiting FOR 'buffer busy waits' (2,a9,3b) wait
17: waiting FOR 'rdbms ipc message' wait
18:
19:
20: waiting FOR 'rdbms ipc message' wait
21: waiting FOR 'log file sync' wait
22: waiting FOR 'ASM background timer' wait
23: waiting FOR 'rdbms ipc message' wait
24: waiting FOR 'log file sync' wait
25: waiting FOR 'class slave wait' wait
26: waiting FOR 'SQL*Net message from client' wait
27: waiting FOR 'SQL*Net message from client' wait
28: waiting FOR 'enq: CF - contention' [Enqueue CF-00000000-00000000] wait
29: waiting FOR 'enq: CF - contention' [Enqueue CF-00000000-00000000] wait
30: waiting FOR 'Streams AQ: qmn coordinator idle wait' wait
31: waiting FOR 'SQL*Net message from client' wait
32: waiting FOR 'SQL*Net message from client' wait
Cmd: SELECT
33: waiting FOR 'log file sync' wait
34: waiting FOR 'Streams AQ: qmn slave idle wait' wait
35: FOR 'Streams AQ: waiting for time management or cleanup tasks' wait
36: waiting FOR 'log file sync' wait
37: waiting FOR 'SQL*Net message from client' wait
Cmd: SELECT
38: waiting FOR 'SQL*Net message from client' wait
39: waiting FOR 'SQL*Net message from client' wait
40: waiting FOR 'PX Deq: Execution Msg' wait
41: waiting FOR 'log file sync' wait
42: waiting FOR 'log file sync' wait
43: waiting FOR 'enq: CF - contention' [Enqueue CF-00000000-00000000] wait
Cmd: SELECT
44: waiting FOR 'SQL*Net message from client' wait
45: waiting FOR 'SQL*Net message from client' wait
46: waiting FOR 'SQL*Net message from client' wait
47: waiting FOR 'SQL*Net message from client' wait
48: waiting FOR 'SQL*Net message from client' wait
49: waiting FOR 'log file sync' wait
50: waiting FOR 'SQL*Net message from client' wait
51: waiting FOR 'SQL*Net message from client' wait
Cmd: SELECT
52: waiting FOR 'row cache lock' [Rcache object=0xa5414098,] wait
Cmd: SELECT
53: waiting FOR 'SQL*Net message from client' wait
54: waiting FOR 'SQL*Net message from client' wait
55: FOR 'Streams AQ: waiting for messages in the queue' wait
56: waiting FOR 'SQL*Net message from client' wait
57: waiting FOR 'SQL*Net message from client' wait
58: waiting FOR 'log file sync' wait
Cmd: INSERT
59: waiting FOR 'jobq slave wait' wait
60: waiting FOR 'jobq slave wait' wait
61: waiting FOR 'SQL*Net message from client' wait
62: waiting FOR 'SQL*Net message from client' wait
63: waiting FOR 'SQL*Net message from client' wait
65: waiting FOR 'SQL*Net message from client' wait
Cmd: SELECT
66: waiting FOR 'SQL*Net message from client' wait
67: LAST wait FOR 'SQL*Net message from client'
68: waiting FOR 'SQL*Net message from client' wait
69: waiting FOR 'SQL*Net message from client' wait
70: waiting FOR 'SQL*Net message from client' wait
71: waiting FOR 'SQL*Net message from client' wait
72: waiting FOR 'SQL*Net message from client' wait
73: waiting FOR 'SQL*Net message from client' wait
74: waiting FOR 'SQL*Net message from client' wait
Cmd: SELECT
75: waiting FOR 'SQL*Net message from client' wait
Cmd: SELECT
76: waiting FOR 'SQL*Net message from client' wait
77: waiting FOR 'log file sync' wait
78: waiting FOR 'SQL*Net message from client' wait
80: waiting FOR 'SQL*Net message from client' wait
Cmd: SELECT
82: waiting FOR 'SQL*Net message from client' wait
84: waiting FOR 'log file sync' wait
86: waiting FOR 'SQL*Net message from client' wait
93: waiting FOR 'SQL*Net message from client' wait
94: waiting FOR 'SQL*Net message from client' wait
Blockers
~~~~~~~~
Above IS a list OF ALL the processes. IF they are waiting FOR a resource
THEN it will be given IN square brackets. Below IS a summary OF the
waited upon resources, together WITH the holder OF that resource.
Notes:
~~~~~
o A process id OF '???' implies that the holder was NOT found IN the
systemstate.
Resource Holder State
Enqueue CF-00000000-00000000 12: waiting FOR 'rdbms ipc message'
Rcache object=0xa5414098, ??? Blocker
Object Names
~~~~~~~~~~~~
Enqueue CF-00000000-00000000
Rcache object=0xa5414098,

With this information, you can investigate further into the System state dump by looking at rcache object 0xa5414098. The “State Objects” are a hierarchical structure. Every state object has an adress “SO: 0xa6a5955c” and an owner, which is the parent entry. In this case, you can construct a hierarchical tree of state objects and find dependencies.

 SO: 0xa6a5995c, TYPE: 2, owner: (nil), flag: INIT/-/-/0x00
 (process) Oracle pid=52, calls cur/top: 0x9d62eb1c/0x9d5ccfcc, flag: (0) -
		   INT error: 0, CALL error: 0, sess error: 0, txn error 0
 (post info) LAST post received: 0 0 163
			 LAST post received-location: kqrbtm
			 LAST process TO post me: a6a4e1dc 1 6
			 LAST post sent: 0 0 24
			 LAST post sent-location: ksasnd
			 LAST process posted BY me: a6a4e1dc 1 6
   (latch info) wait_event=0 bits=0
	Process GROUP: DEFAULT, pseudo proc: 0xa6baf79c
	O/S info: USER: oracle, term: UNKNOWN, ospid: 29032
    OSD pid info: Unix process pid: 29032, image: oracle@ora-vm1
    Short stack dump: ksdxfstk()+19<-ksdxcb()+1321<-sspuser()+102<-__kernel_vsyscall()+16
 
    SO: 0xa4123d20, TYPE: 4, owner: 0xa6a5995c, flag: INIT/-/-/0x00
    (SESSION) sid: 955 trans: (nil), creator: 0xa6a5995c, flag: (41) USR/- BSY/-/-/-/-/-
              DID: 0001-0034-00002227, short-term DID: 0001-0034-00002228
              txn branch: (nil)
              oct: 3, prv: 0, SQL: 0x8fee9d44, psql: 0x8fee9f00, USER: 23/DBSNMP
     service name: SYS$USERS
     O/S info: USER: oracle, term: , ospid: 28849, machine: ora-vm1
               program: emagent@ora-vm1(TNS V1-V3)
     application name: emagent@ora-vm1(TNS V1-V3), hash VALUE=2420928497
     waiting FOR 'row cache lock' blocking sess=0x(nil) seq=127 wait_time=0 seconds since wait started=0
                 cache id=d, mode=0, request=5
     Dumping SESSION Wait History
      FOR 'row cache lock' COUNT=1 wait_time=1337230
		         cache id=d, mode=0, request=5
      FOR 'row cache lock' COUNT=1 wait_time=2930561
                 cache id=d, mode=0, request=5
      FOR 'row cache lock' COUNT=1 wait_time=2929106
                 cache id=d, mode=0, request=5
      FOR 'row cache lock' COUNT=1 wait_time=2933326
 
      SO: 0x9d5ccfcc, TYPE: 3, owner: 0xa6a5995c, flag: INIT/-/-/0x00
     (CALL) sess: cur a4123d20, rec a4123d20, usr a4123d20; depth: 0
       ----------------------------------------
       SO: 0x9d51b1a0, TYPE: 3, owner: 0x9d5ccfcc, flag: INIT/-/-/0x00
       (CALL) sess: cur a4123d20, rec a417f990, usr a4123d20; depth: 1
         ----------------------------------------
         SO: 0x9d69d7f0, TYPE: 3, owner: 0x9d51b1a0, flag: INIT/-/-/0x00
         (CALL) sess: cur a417f990, rec a417f990, usr a4123d20; depth: 2
         ----------------------------------------
           SO: 0x9d5e547c, TYPE: 3, owner: 0x9d69d7f0, flag: INIT/-/-/0x00
           (CALL) sess: cur a417f990, rec a417f990, usr a4123d20; depth: 3
           ----------------------------------------
             SO: 0x9d62eb1c, TYPE: 3, owner: 0x9d5e547c, flag: INIT/-/-/0x00
             (CALL) sess: cur a417f990, rec a417f990, usr a4123d20; depth: 4
             ----------------------------------------
               SO: 0x9fedf748, TYPE: 40, owner: 0x9d62eb1c, flag: INIT/-/-/0x00
               (trans) flg = 0x21, flg2 = 0x80000, prx = 0x0, ros = 2147483647 bsn = 0x395 bndsn = 0x395 spn = 0x395
               efd = 34
               parent xid: 0x0000.000.00000000
               env: (scn: 0x0000.00000000 xid: 0x0000.000.00000000 uba: 0x00000000.0000.00 statement num=0 parent xid: xid:
               0x0000.000.00000000 scn: 0x0000.00000000 0sch: scn: 0x0000.00000000)
               cev: (spc = 0 arsp = 0 ubk tsn: 0 rdba: 0x00000000 useg tsn: 0 rdba: 0x00000000
                     hwm uba: 0x00000000.0000.00 col uba: 0x00000000.0000.00
                     num bl: 0 bk list: 0x0)
                     cr opc: 0x0 spc: 0 uba: 0x00000000.0000.00
               (enqueue) TX-00010011-00D3B4CF DID: 0001-0051-00001323
               lv: 7c ac ad 4d 03 00 00 00 00 00 00 00 00 00 00 00 res_flag: 0x45
               res: 0xa447bb2c, lock_flag: 0x0
               own: 0xa4157e90, sess: 0xa4157e90, prv: 0x9fedf774
               slk: (nil)
                xga: 0x0, heap: UGA
               Trans IMU st: 0 Pool INDEX 65535, Redo pool 0x9fedfb74, Undo pool 0x9fedfbf0
               Redo pool range [0x40ed57e8 0x40ed57e8 0x40ed7fe8]
               Undo pool range [0x40ed2fe8 0x40ed2fe8 0x40ed57e8]
               ----------------------------------------
                  SO: 0x9d42237c, TYPE: 50, owner: 0x9fedf748, flag: INIT/-/-/0x00
                  ROW cache enqueue: COUNT=1 SESSION=0xa4123d20 object=0xa5414098, request=X
                  SAVEPOINT=0x395
                  ROW cache parent object: address=0xa5414098 cid=13(dc_sequences)
                  hash=9649e7e typ=9 TRANSACTION=(nil) flags=00000000
                  own=0xa5414104[0xa5414104,0xa5414104] wat=0xa541410c[0x9d422398,0x9d422398] mode=N
                  STATUS=-/-/-/-/-/-/-/-/-
                  request=X release=FALSE flags=2
                  instance LOCK id=QN 010cdbb3 b3b24847

Tanel Poder showed in his seminar to use “grep -A 3 SO:” to find all lines containing the State Object Adresses and the following 3 lines to get the tree.



SQL Profiler TVD$XTAT update available

I just saw that Christian Antognini has released an update of TVD$XTAT. I am looking forward to experimenting with it. If you used tkprof in the past, you should definitely take a look at this tool.



Wait Event “cursor: pin S wait on X” suspected to be related with Automatic Memory Management

Quite a while ago, I experienced severe performance problems while processing peak workload with multiple sessions, all waiting on "pin S wait on X". Somehow I suspected that it might have to do with frequent Automatic Memory Management Resize operations. Therefore, we have disabled it with sga_target=0. After that, we did not experience these issues anymore.

As a responsible DBA I have opened a service request with Oracle and asked:

When looking for the wait event CURSOR: PIN S WAIT ON X on metalink, there are quite a few relations to the automatic shared memory management. Could you investigate, whether disabling ASMM would benefit regarding to these mutex waits?

Oracle replied:

Based on the uploaded systemstate and stacks it is no connection between the mutex waits and the ASMM.

Today, I have read a metalink note: 742599.1 – FREQUENT RESIZE OF SGA, which confirms my suspicion. The bug is tracked as Bug 6528336 – LARGE NUMBER OF SESSIONS WAITING ON CURSOR: PIN S WAIT ON X. There is a patch available for HP-UX Itanium on 10.2.0.3 but I would rather disable Automatic Memory Management and wait for it to be fixed with a later – and better tested – patchset.

Support however recommended to get systemstate level 266 dumps as soon as the problem occurs again and then look for the holder of the mutex. An example is given on MetaLink Note: 423153.1.

The correlation is done after th IDN as bellow:

To find more details use the idn=XXXXXX to search down in the systemstate (ie idn=535d1a6c)

KGX Atomic Operation Log 7000002e5b9d160
Mutex 7000002b8e92268(3094, 0) idn 535d1a6c oper GET_SHRD SID 3094 holds it
Cursor Pin uid 2489 efd 0 whr 5 slp 58733 SID 2489 wants it in Shared (GET_SHRD)
opr=2 pso=70000028c47def0 flg=0
pcs=7000002b8e92268 nxt=0 flg=34 cld=3 hd=70000030d6c6eb0 par=7000002eefe64d0
ct=31 hsh=0 unp=0 unn=0 hvl=b825a4d0 nhv=1 ses=700000309b42600
hep=7000002b8e922e8 flg=80 ld=1 ob=7000002de49f8a0 ptr=70000022cf39db8 fex=70000022cf390c8

To find the HOLDER, search for idn XXXXXXX oper until you find one which is held (ie not GET_XXX)(
ie idn 535d1a6c oper):-

KGX Atomic Operation Log 7000002cd934270
Mutex 7000002b8e92268(3094, 0) idn 535d1a6c oper EXCL SID 3094 holds in Exclusive (EXCL)
Cursor Pin uid 3094 efd 0 whr 7 slp 0
opr=3 pso=7000002a71c4180 flg=0
pcs=7000002b8e92268 nxt=0 flg=34 cld=3 hd=70000030d6c6eb0 par=7000002eefe64d0
ct=31 hsh=0 unp=0 unn=0 hvl=b825a4d0 nhv=1 ses=700000309b42600
hep=7000002b8e922e8 flg=80 ld=1 ob=7000002de49f8a0 ptr=70000022cf39db8 fex=70000022cf390c8

The wait event "cursor: pin S wait on X" is also related to two different bugs:

  • Bug 5485914 – MUTEX REPORTED SELF DEADLOCK AFTER
    DBMS_MONITOR.SESSION_TRACE_ENABLE
  • Bug 5907779 – Self deadlock hang on “cursor: pin S wait on X” (typically from DBMS_STATS). An excellent way to diagnose if you suffer from this bug is given here.


ORA-00322, ORA-00312 at DataGuard Standby

From time to time I have encountered the following errors on the physical dataguard standby database while in recovery mode:

Errors in file /oracle/STBDB1/oratrace/bdump/stbdb1_mrp0_26719.trc:
ORA-00322: log 5 of thread 1 is not current copy
ORA-00312: online log 5 thread 1: '/oracle/STBDB1/origlogB/standby_g5_m1.log'
Sun Oct 16 13:04:08 2008
Errors in file /oracle/STBDB1/oratrace/bdump/stbdb1_mrp0_26719.trc:
ORA-00322: log 5 of thread 1 is not current copy
ORA-00312: online log 5 thread 1: '/oracle/STBDB1/origlogB/standby_g5_m1.log'

The problem corrects itself after a couple of minutes, so there is no real problem, but if you have alert log monitoring active, this will trigger a call for investigation. Oracle is tracking this error with Bug 5238386 – ORA-322 possible reading standby redo log header. There is a one-off patch available for 10.2.0.3 and the patch is included in patchset 10.2.0.4.

There is a workaround to clear the referenced redo log but I don´t see any point in doing that, because the error can occurr again as long as the patch is not installed.