Archive for December 2008

OEM Agents number of filedescriptors in steadily increasing

If you are running 10.2.0.4 databases and OEM Grid Control Agent 10.2.0.4 and you see that the number of filedescriptors of emagent is constantly increasing, you are hitting Bug 7031906.

The errors in the logfile look like this:

2008-06-27 14:32:28,973 Thread-301 ERROR engine: [oracle_database,MDDB2.world,health_check] :
nmeegd_GetMetricData failed : Instance Health Check initialization failed due to one of the following causes:
the owner of the EM agent process is not same as the owner of the Oracle instance processes; the owner of the
EM agent process is not part of the dba group; or the database version is not 10g (10.1.0.2) and above.
2008-06-27 14:32:28,973 Thread-301 WARN collector: Error exit. Error message: Instance Health
Check initialization failed due to one of the following causes: the owner of the EM agent process is not same as
the owner of the Oracle instance processes; the owner of the EM agent process is not part of the dba group; or
the database version is not 10g (10.1.0.2) and above.

And if you do pmap -x on the pid of emagent, then you will see dozens of lines containing hc_.dat.

Oracle tracks this issue with Bug 7031906 and there are patches available. In case you are running an earlier version of agent or database, check MetaLink Note 602633.1.

I have realized that I have included it in the document “Oracle Enterprise Manager 10gR2 – Grid Control Installation” in the secion “papers”, but unfortunately, in german only.



JDBC Pool with Implicit Connection Cache

I have finally found the chance to make my first steps in Java. Instead of “Hello World”, I wanted to test the performance effect of using or NOT using a connection pool for a Java application.

I have prepared 2 Java Testcases:

  • ConnWithIcc.java: 999 Select statements with JDBC Pooling via Implicit Connection Cache
  • Conn WithougIcc.java: 999 Select statements withough JDBC Pooling

ConnWithIcc.java:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
import java.sql.*;
import java.util.*;
import oracle.jdbc.pool.*;
import oracle.jdbc.*;
 
class ConnWithIcc
{
  public static void main(String args[]) throws InterruptedException
  {
    ResultSet rset = null;
    Connection conn = null;
    Statement stmt = null;
 
    try
    {
      // create a DataSource
      OracleDataSource ods = new OracleDataSource();
      ods.setURL("jdbc:oracle:thin:@//ora-vm1:1521/ICC");
      ods.setUser("scott");
      ods.setPassword("tiger");
      // set cache properties
      java.util.Properties prop = new java.util.Properties();
      prop.setProperty("MinLimit", "10");
      prop.setProperty("MaxLimit", "10");
 
      // set DataSource properties
      ods.setConnectionCachingEnabled(true); // be sure set to true
      ods.setConnectionCacheProperties (prop);
      ods.setConnectionCacheName("ImplicitCache01"); // this cache's name
 
      for (int i=1; i < 1000; i++)
      {
         //System.out.println("Establishing connection " +i);
 
         // get the connection
         conn = ods.getConnection();
         //System.out.println("Connected.\nPrinting query results ...\n");
         // Create a stmt
         stmt = conn.createStatement();
         // execute the query
         // rset = stmt.executeQuery( "select 1 from dual where " + i + " = " + i );
         rset = stmt.executeQuery( "select 1 from dual");
         while (rset.next())
         {
            int dualnr = rset.getInt ( 1 );
         }
         conn.close();
         rset.close();
         stmt.close();
 
 
/*
         try
         {
            Thread.currentThread().sleep(1000);
         }
         catch(InterruptedException ie)
         {
               System.err.println ("error message: " + ie.getMessage() );
               ie.printStackTrace();
               Runtime.getRuntime().exit(1);
 
         }
         //System.out.print("\f");
*/
 
      }
   }
   catch (SQLException e)
   {
      // handle the exception properly - in this case, we just
      // print a message and stack trace and exit the application
      System.err.println ("error message: " + e.getMessage() );
      e.printStackTrace();
      Runtime.getRuntime().exit(1);
   }
   finally
   {
      // close the result set, the stmt and connection.
      // ignore any exceptions since we are in the
      // finally clause.
      try
      {
         if( rset != null )
           rset.close();
         if( stmt != null )
           stmt.close();
         if( conn != null )
           conn.close();
      }
      catch ( SQLException ignored ) {ignored.printStackTrace(); }
   }
  }
}

ConnWithoutIcc.java:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
import java.sql.*;
import java.util.*;
import oracle.jdbc.pool.*;
import oracle.jdbc.*;
 
class ConnWithoutIcc
{
  public static void main(String args[]) throws InterruptedException
  {
    ResultSet rset = null;
    Connection conn = null;
    Statement stmt = null;
 
    try
    {
      // create a DataSource
      OracleDataSource ods = new OracleDataSource();
      ods.setURL("jdbc:oracle:thin:@//ora-vm1:1521/NOICC");
      ods.setUser("scott");
      ods.setPassword("tiger");
 
      // set cache properties
      //java.util.Properties prop = new java.util.Properties();
      //prop.setProperty("MinLimit", "10");
      //prop.setProperty("MaxLimit", "10");
 
      // set DataSource properties
      //ods.setConnectionCachingEnabled(true); // be sure set to true
      //ods.setConnectionCacheProperties (prop);
     // ods.setConnectionCacheName("ImplicitCache01"); // this cache's name
 
 
 
      for (int i=1; i < 1000; i++)
      {
         //System.out.println("Establishing connection " +i);
 
         // get the connection
         conn = ods.getConnection();
         //System.out.println("Connected.\nPrinting query results ...\n");
         // Create a stmt
         stmt = conn.createStatement();
         // execute the query
         // rset = stmt.executeQuery( "select 1 from dual where " + i + " = " + i );
         rset = stmt.executeQuery( "select 1 from dual");
 
         while (rset.next())
         {
            int dualnr = rset.getInt ( 1 );
         }
 
 
/*
         try
         {
            Thread.currentThread().sleep(1000);
         }
         catch(InterruptedException ie)
         {
               System.err.println ("error message: " + ie.getMessage() );
               ie.printStackTrace();
               Runtime.getRuntime().exit(1);
 
         }
*/
         //System.out.print("\f");
         conn.close();
         rset.close();
         stmt.close();
 
      }
   }
   catch (SQLException e)
   {
      // handle the exception properly - in this case, we just
      // print a message and stack trace and exit the application
      System.err.println ("error message: " + e.getMessage() );
      e.printStackTrace();
      Runtime.getRuntime().exit(1);
   }
   finally
   {
      // close the result set, the stmt and connection.
      // ignore any exceptions since we are in the
      // finally clause.
      try
      {
         if( rset != null )
           rset.close();
         if( stmt != null )
           stmt.close();
         if( conn != null )
           conn.close();
      }
      catch ( SQLException ignored ) {ignored.printStackTrace(); }
   }
  }
}

The first step is of course to put the java executable in the path and set the classpath environment variable:

1
2
3
4
5
[oracle@ora-vm1 ICC]$ echo $PATH
/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/home/oracle/bin:/u01/app/oracle/product/10.2.0/bin:/u01/app/oracle/product/10.2.0/jre/1.4.2/bin:/u01/app/oracle/product/10.2.0/jdk/bin
 
[oracle@ora-vm1 ICC]$ echo $CLASSPATH
.:/u01/app/oracle/product/10.2.0/jdbc/lib/ojdbc14.jar:/u01/app/oracle/product/10.2.0/jdbc/lib/orai18n.jar

Next, you have to compile the java files into class files:

1
2
[oracle@ora-vm1 ICC]$ javac ConnWithoutIcc.java
[oracle@ora-vm1 ICC]$ javac ConnWithIcc.java

Finally, we can execute both classes and measure the performance difference:

1
2
3
4
5
6
7
8
9
10
11
[oracle@ora-vm1 ICC]$ time java ConnWithIcc
 
real    0m1.501s
user    0m0.446s
sys     0m0.484s
 
[oracle@ora-vm1 ICC]$ time java ConnWithoutIcc
 
real    0m53.653s
user    0m1.528s
sys     0m2.561s

So, the 999 “select 1 from dual” statements take about 1.5 seconds with a jdbc connection pool and take almost 1 minute without a connection pool! Moreover, if you divide 999 statements by 53.6 seconds, you get an average of 18.6 connections per second. You can easily see that one of my 2 core´s is completely utilized by 999 establishing and tear down database connections.

[oracle@ora-vm1 ICC]$ sar -u 5 100
Linux 2.6.18-92.1.18.el5 (ora-vm1.intra)        12/23/2008

12:06:48 PM       CPU     %user     %nice   %system   %iowait    %steal     %idle
12:06:53 PM       all      0.30      0.00      1.90      2.30      0.00     95.51
12:06:58 PM       all      0.10      0.00      1.80      1.60      0.00     96.50
12:07:03 PM       all      7.51      0.00     15.92      0.70      0.00     75.88
12:07:08 PM       all     12.90      0.00     42.20      0.10      0.00     44.80
12:07:13 PM       all     13.60      0.00     40.80      0.50      0.00     45.10
12:07:18 PM       all     13.00      0.00     41.80      0.60      0.00     44.60
12:07:23 PM       all     14.09      0.00     40.16      0.40      0.00     45.35
12:07:28 PM       all     13.09      0.00     41.66      0.50      0.00     44.76
12:07:33 PM       all     13.00      0.00     41.40      0.50      0.00     45.10
12:07:38 PM       all     14.79      0.00     40.66      0.20      0.00     44.36
12:07:43 PM       all     15.02      0.00     39.84      0.30      0.00     44.84
12:07:48 PM       all     13.10      0.00     41.90      0.60      0.00     44.40
12:07:53 PM       all     14.99      0.00     40.06      0.70      0.00     44.26
12:07:58 PM       all      4.20      0.00     12.91      0.60      0.00     82.28
12:08:03 PM       all      0.30      0.00      1.50      1.00      0.00     97.20


Installation Prerequisites for Oracle Databases

More often than not, I see installations, where either shared memory parameters or ulimit settings are not set as required for an optimal Oracle database installation.

The tool RDA, commonly known as a support tool to gather diagnostic information for Oracle Support for service requests, can do these checks quite nicely.

Steps:

  • Download RDA from MetaLink Note 314422.1.
  • unzip in working directory
  • run Health Check Validation Engine of RDA
  • $ ./rda.pl -T hcve
    Processing HCVE tests ...
    Available Pre-Installation Rule Sets:
    1. Oracle Database 10g (10.1.0) PreInstall(HP-UX)
    2. Oracle Database 10g R1 (10.1.0) PreInstall (HP-UX Itanium)
    3. Oracle Database 10g R2 (10.2.0) PreInstall (HPUX)
    4. Oracle Database 11g R1 (11.1.0) PreInstall (HPUX)
    5. Oracle Application Server 10g (9.0.4) PreInstall (HP-UX)
    6. Oracle Application Server 10g R2 (10.1.2) PreInstall (HP-UX)
    7. Oracle Portal PreInstall (Generic)
    Available Post-Installation Rule Sets:
    8. Oracle Portal PostInstall (generic)
    9. Data Guard PostInstall (Generic)
    Enter the HCVE rule set number
    Hit 'Return' to accept the default (1)
    > 3

    Enter value for < Planned ORACLE_HOME location or if set >
    Hit 'Return' to accept the default ($ORACLE_HOME)
    >

    Test "Oracle Database 10g R2 (10.2.0) PreInstall (HPUX)" executed at Tue Dec 23 15:26:31 2008

    Test Results
    ~~~~~~~~~~~~

    ID NAME RESULT VALUE
    ===== ==================== ====== ========================================
    10 OS Certified? PASSED Certified with 10g R2 RDBMS
    20 User in /etc/passwd? PASSED userOK
    25 Got EXTJOB User? FAILED ExtjobNotFound
    30 Group in /etc/group? PASSED GroupOK
    40 Input ORACLE_HOME RECORD $ORACLE_HOME
    50 ORACLE_HOME Valid? PASSED OHexists
    60 O_H Permissions OK? PASSED CorrectPerms
    70 Umask Set to 022? PASSED UmaskOK
    80 LDLIBRARYPATH Unset? PASSED UnSet
    90 SHLIB_PATH Unset? PASSED UnSet
    100 Other O_Hs in PATH? PASSED NotFound
    110 oraInventory Permiss FAILED oraInventoryNotOK
    120 /tmp Adequate? PASSED TempSpaceOK
    130 Swap (in MB) RECORD 15144
    140 RAM (in MB) PASSED 14334
    150 SwapToRAM OK? PASSED SwapToRAMOK
    160 Disk Space OK? FAILED OnlySpaceForOne
    170 Kernel Parameters OK FAILED [EXECUTABLE_STACK=1] too large [MAXUP..>
    175 Links and Libs OK? PASSED AllExist
    180 Got ld,nm,ar,make? PASSED ld_nm_ar_make_found
    190 ulimits OK? PASSED ulimitOK
    200 Got OS Bundles? PASSED GOLDAPPS11iandGOLDBASE11iAdequate
    210 Got OS Patches? FAILED [PHNE_31097 or its successor PHNE_324..>
    220 Other OUI Up? PASSED NoOtherOUI

    The output file contains detailed information, about why a specific check failed.

    [PHNE_31097 or its successor PHNE_32477 or its successor PHNE_33498 or its successor PHNE_35418] not installed
    [PHSS_31221 or its successor PHSS_33263 or its successor PHSS_33944] not installed
    [PHSS_30970 or its successor PHSS_33033 or its successor PHSS_35379] not installed
    [PHSS_32508 or its successor PHSS_34411 or its successor PHSS_35099] not installed
    [PHSS_32509 or its successor PHSS_34412 or its successor PHSS_35098] not installed
    [PHSS_32510 or its successor PHSS_34413 or its successor PHSS_35100] not installed

    FAILED
    [EXECUTABLE_STACK=1] too large
    [MAXUPRC=1024] too small
    [MSGMNI=50] too small
    [MSGTQL=40] too small
    [NCSIZE=9964] too small
    [NFILE=5000] too small
    [NINODE=4844] too small
    [SHMMAX=1024000000] too small
    [MAXSWAPCHUNKS=2048] too small
    [MSGMAP=42] too small == KernelOK



Performance Monitoring of HP EVA Storage

I have been made to believe that the only way to monitor EVA performance is by doing it from the hosts attached, e.g. iostat, sar, etc.

Now, I have seen that there is in fact an excellent tool, called evaperf, which can be run on a windows host, which has to be attached to the storage. This tool can report just about every metric you can imagine. It can be saved as CSV File, then imported into a database for evluation or most of the metrics can be integrated in Windows perfmon. Moreover, there is a tool called tlviz, which can be used to display the metrics in graphical manner.

  • Host port statistic: Read Req/s, Read MB/s, Read Latency (ms), Write Req/s, Write MB/s, Write Latency (ms), Av Queue Depth
  • Virtual Disk (LUN) statistics: Read Hit Req/s, Read Hit MB/s, Read Hit Latency, Read Miss Req/s, Read Miss Data Rate, Read Miss Latency, Flush Data Rate, Mirror Data Rate, Prefetch Data Rate
  • Physical Disk statistics: Drive Queue Depth, Read Req/s, Read MB/s, Read Latency (ms), Write Req/s, Write MB/s, Write Latency (ms)
  • Host Connection statistics: Queue Depth, Busies
  • Histogram statistics: Read / write latency histogram, Transfer size histogram
  • Array Status statistics: Total Host Req/s, Total Host MB/s
  • Controller Status statistics: Controller CPU Utilization, Percent Data Transfer time

There is an very good documentation available from HP.



NUMA enabled in 10.2.0.4

When upgrading from pre 10.2.0.4 to 10.2.0.4, Oracle enables NUMA support. This has the effect that there can be multiple shared memory segments (MetaLink Note: 429872.1) although shmmax/shmall are set to high values.

I have read MetaLink Notes (7171446.8, 6730567.8, 6689903.8) and this blog entry, where a customer had problems on HP-UX with the default NUMA settings.

Better than that, it can also lead to instance crashes in 10.2.0.4 as reported in MetaLink Note 743191.1. Good news is that there is a patch available for Linux x86_64/10.2.0.4.

I have asked Oracle Support whether it is safe to leave NUMA enabled for Linux Itanium, but they would not comment on it. Instead they asked me to check with the OS vendor. Great. ;-(



Installing LTOM for RAC Hanganalyze

I have recently written a short installation instruction for a customer to use LTOM for RAC hanganalyze.

  • Download von MetaLink Note 352363.1:
    https://metalink2.oracle.com/cgi-bin/cr/getfile.cgi?p_attid=352363.1:ltom
  • unzip as User Oracle in directory with sufficient space. subdir ltom will be created

tar xfv ltom411.tar

  • Install statspack
  • Installation: db account ltom/ltom will be created

Java Binary in path
which java

if not, place java in path in ~/.bash_profile:
als User oracle: add $ORACLE_HOME/jre/1.4.2/bin to PATH Variable

e.g.:
export PATH=$PATH:$ORACLE_HOME/bin:$ORACLE_HOME/jre/1.4.2/bin

Put TOM_HOME in .~/.bash_profile:
export TOM_HOME=/home/oracle/ltom/tom_base/install/../tom

Sourcen of TOM_HOME

cd /home/oracle/ltom/tom_base/install
./autoinstall.sh

Sample Output:
[oracle@ora-vm1 install]$ ./autoinstall.sh
TOM_HOME=/home/oracle/ltom/tom_base/install/../tom
rm: cannot remove `autoinstall.log’: No such file or directory

Preparing to install LTOM v4.0.0…

Checking your Oracle Environment…

ORACLE_HOME=/u01/app/oracle/product/10.2.0
ORACLE_SID=ORA102
TNS_ADMIN=
TOM_HOME=/home/oracle/ltom/tom_base/install/../tom

Is this the correct Oracle environment you wish to use for installing LTOM?
Hit Return to Accept Default:
============================> \c

Continuing the install with this environment

Checking for Java…

Java found.

Checking for top utility…
Found top utility.

Checking for vmstat utility…
Found vmstat utility.

Need to run some sql now to create TOM user
it Return to allow setup to do this for you, enter other to do this manually:
============================> \c

Tom user created successfully.
Warning…Please change tom password immediately!

Grant user Tom successful.
Create package dbmssupp successful.
Grant execute on dbmssupp successful.

Checking for statspack…

Cannot verify statspack is installed on your system.
LTOM requires statspack to be installed to work properly.

Enter your email id to get notification of hangs via email…
Hit Return to ignore hang notification via email:
============================> \c

Configuring tom_deploy.properties file…

TOM/LTOM successfully installed.

Please review any messages you received above.

REMEMBER: You must have TOM_HOME defined and java in your PATH
before you can run TOM/LTOM.

  • Configuration

Change ltom/tom_base/tom/init/hangDetect.properties

from:
#RAC_AWARE=TRUE
HANGANALYZE=A
SYSTEMSTATE=266
DELAY=30
HANGANALYZE=A
SYSTEMSTATE=266

to:
RAW_AWARE=TRUE
HANGANALYZE=A
SYSTEMSTATE=266
DELAY=30
HANGANALYZE=A
SYSTEMSTATE=266

Change tom db passwort:

SQL> alter user tom identified by secret;

Add credentials to config file:

vi tom_deploy.properties

DB_USERID=tom
:q!

./startltom.sh -p
Enter same password as in sqlplus. Password will be saved encrypted in tom_deploy.properties

  • edit tom_deploy.properties to decide which of the 3 modules should be running during background operations: hangdetect, profiler, session recorder
  • Start/Stop Scripts

vi start_ltom.sh
#!/bin/bash
. ~/.bash_profile
nohup $TOM_HOME/startltom.sh -s &

vi stop_ltom.sh
#!/bin/bash
touch $TOM_HOME/tmp/hang.stop
touch $TOM_HOME/tmp/profile.stop
touch $TOM_HOME/tmp/session.stop

chmod u+x start_ltom.sh
chmod u+x stop_ltom.sh

  • If you want  to run ltom during a specific timeframe, use cron (as user oracle)

crontab -e
45 01 * * *     /home/oracle/ltom/start_ltom.sh >/dev/null 2>/dev/null </dev/null
00 05 * * *     /home/oracle/ltom/start_ltom.sh >/dev/null 2>/dev/null </dev/null



Installing 10gR2 RAC on Linux Itanium (Montecito)

Recently, I had to install 10gR2 on Linux Itanium (Montecito CPUs) and found out that the Java version that ships with the binaries does not work on this platform. Therefore you have to download Patch 5390722 and perform the following steps for RAC installation:

  1. Install Patch 5390722: Install JDK into new 10.2 CRS Home, then install JRE into new 10.2 CRS Home.
  2. Take a tar backup of the CRS Home containing these two components. You will need it.
  3. Install 10.2.0.1 Clusterware by running from 10.2.0.1 binaries: ./runInstaller -jreLoc $CRS_HOME/jre/1.4.2
  4. Install Patch 5390722 with the option CLUSTER_NODES={"node1", "node2", ...}: Install JDK into new 10.2 RDBMS Home, then install JRE into new 10.2 RDBMS
  5. Install 10.2.0.1 RDBMS Binaries into the new 10.2 RDBMS: ./runInstaller -jreLoc $ORACLE_HOME/jre/1.4.2
  6. If you want to install the 10.2.0.4 patchset, you will have to follow these steps:
    for CRS: ./runInstaller -jreLoc $ORA_CRS_HOME/jdk/jre
    for RDBMS: ./runInstaller -jreLoc $ORACLE_HOME/jdk/jre
  7. After that, you have to repair the JRE because the 10.2.0.4 patchset has overwritten the patched JRE with the defective versions. (7448301)
    % cd $ORACLE_HOME/jre
    % rm -rf 1.4.2
    % tar –xvf $ORACLE_HOME/jre/1.4.2-5390722.tar

Sources:

  • Note: 404248.1 – How To Install Oracle CRS And RAC Software On Itanium Servers With Montecito Processors
  • Note: 400227.1 – How To Install Oracle RDBMS Software On Itanium Servers With Montecito Processors
  • Bug 7448301 – Linux Itanium: 10.2.0.4 Patchset for Linux Itanium (Montecito) has wrong Java runtime


Troublshooting Trace Files of DIAG Background Process

In trace files of the background process DIAG, I saw an interesting section that I would like to share.


*** 2008-12-09 03:46:33.280
Dump requested by process [orapid=6]
REQUEST:custom dump [2] with parameters [6][27][4][2]
Dumping process info of pid[27.31384] requested by pid[6.31160]
ORA-00494: Message 494 not found; No message file for product=RDBMS, facility=ORA; arguments: [ [CF]] [ (more than 900 seconds)] [inst
2, osid 31384]
-------------------------------------------------------------------------------
ENQUEUE [CF] HELD FOR TOO LONG
enqueue holder: 'inst 2, osid 31384'
Process 'inst 2, osid 31384' is holding an enqueue for maximum allowed time.
The process will be terminated.
Oracle Support Services triaging information: to find the root-cause, look
at the call stack of process 'inst 2, osid 31384' located below. Ask the
developer that owns the first NON-service layer in the stack to investigate.
Common service layers are enqueues (ksq), latches (ksl), library cache
pins and locks (kgl), and row cache locks (kqr).

It is very nice of diag to tell me what to do with the output. 😉 Especially:

Ask the developer that owns the first NON-service layer in the stack to investigate.



System State Dump Evaluation with ass.awk

I have found out that there is an Oracle Support utility called ass.awk, which can be used to analyze and format a System State dump. It is contained in LTOM, which can be downloaded from MetaLink.

The AWK script is called ass109.awk and can be used in this manner:

[oracle@ora-vm1 ~]$ awk -f ass109.awk mddb1_diag_12345.trc
 
System State ASS:
Starting Systemstate 1
..............................................................................
.....
Ass.Awk Version 1.0.9 - Processing mddb1_diag_12345.trc
System State 1
~~~~~~~~~~~~~~~~
1:
2: waiting FOR 'pmon timer' wait
3: LAST wait FOR 'ksdxexeotherwait'
4: waiting FOR 'rdbms ipc message' wait
5: waiting FOR 'rdbms ipc message' wait
6: waiting FOR 'ges remote message' wait
7: waiting FOR 'gcs log flush sync' wait
8: waiting FOR 'gcs log flush sync' wait
9: waiting FOR 'rdbms ipc message' wait
10: waiting FOR 'rdbms ipc message' wait
11: waiting FOR 'enq: CF - contention' [Enqueue CF-00000000-00000000] wait
12: waiting FOR 'rdbms ipc message' wait
13: waiting FOR 'buffer busy waits' (2,a9,3b) wait
14: waiting FOR 'rdbms ipc message' wait
15: waiting FOR 'rdbms ipc message' wait
16: waiting FOR 'buffer busy waits' (2,a9,3b) wait
17: waiting FOR 'rdbms ipc message' wait
18:
19:
20: waiting FOR 'rdbms ipc message' wait
21: waiting FOR 'log file sync' wait
22: waiting FOR 'ASM background timer' wait
23: waiting FOR 'rdbms ipc message' wait
24: waiting FOR 'log file sync' wait
25: waiting FOR 'class slave wait' wait
26: waiting FOR 'SQL*Net message from client' wait
27: waiting FOR 'SQL*Net message from client' wait
28: waiting FOR 'enq: CF - contention' [Enqueue CF-00000000-00000000] wait
29: waiting FOR 'enq: CF - contention' [Enqueue CF-00000000-00000000] wait
30: waiting FOR 'Streams AQ: qmn coordinator idle wait' wait
31: waiting FOR 'SQL*Net message from client' wait
32: waiting FOR 'SQL*Net message from client' wait
Cmd: SELECT
33: waiting FOR 'log file sync' wait
34: waiting FOR 'Streams AQ: qmn slave idle wait' wait
35: FOR 'Streams AQ: waiting for time management or cleanup tasks' wait
36: waiting FOR 'log file sync' wait
37: waiting FOR 'SQL*Net message from client' wait
Cmd: SELECT
38: waiting FOR 'SQL*Net message from client' wait
39: waiting FOR 'SQL*Net message from client' wait
40: waiting FOR 'PX Deq: Execution Msg' wait
41: waiting FOR 'log file sync' wait
42: waiting FOR 'log file sync' wait
43: waiting FOR 'enq: CF - contention' [Enqueue CF-00000000-00000000] wait
Cmd: SELECT
44: waiting FOR 'SQL*Net message from client' wait
45: waiting FOR 'SQL*Net message from client' wait
46: waiting FOR 'SQL*Net message from client' wait
47: waiting FOR 'SQL*Net message from client' wait
48: waiting FOR 'SQL*Net message from client' wait
49: waiting FOR 'log file sync' wait
50: waiting FOR 'SQL*Net message from client' wait
51: waiting FOR 'SQL*Net message from client' wait
Cmd: SELECT
52: waiting FOR 'row cache lock' [Rcache object=0xa5414098,] wait
Cmd: SELECT
53: waiting FOR 'SQL*Net message from client' wait
54: waiting FOR 'SQL*Net message from client' wait
55: FOR 'Streams AQ: waiting for messages in the queue' wait
56: waiting FOR 'SQL*Net message from client' wait
57: waiting FOR 'SQL*Net message from client' wait
58: waiting FOR 'log file sync' wait
Cmd: INSERT
59: waiting FOR 'jobq slave wait' wait
60: waiting FOR 'jobq slave wait' wait
61: waiting FOR 'SQL*Net message from client' wait
62: waiting FOR 'SQL*Net message from client' wait
63: waiting FOR 'SQL*Net message from client' wait
65: waiting FOR 'SQL*Net message from client' wait
Cmd: SELECT
66: waiting FOR 'SQL*Net message from client' wait
67: LAST wait FOR 'SQL*Net message from client'
68: waiting FOR 'SQL*Net message from client' wait
69: waiting FOR 'SQL*Net message from client' wait
70: waiting FOR 'SQL*Net message from client' wait
71: waiting FOR 'SQL*Net message from client' wait
72: waiting FOR 'SQL*Net message from client' wait
73: waiting FOR 'SQL*Net message from client' wait
74: waiting FOR 'SQL*Net message from client' wait
Cmd: SELECT
75: waiting FOR 'SQL*Net message from client' wait
Cmd: SELECT
76: waiting FOR 'SQL*Net message from client' wait
77: waiting FOR 'log file sync' wait
78: waiting FOR 'SQL*Net message from client' wait
80: waiting FOR 'SQL*Net message from client' wait
Cmd: SELECT
82: waiting FOR 'SQL*Net message from client' wait
84: waiting FOR 'log file sync' wait
86: waiting FOR 'SQL*Net message from client' wait
93: waiting FOR 'SQL*Net message from client' wait
94: waiting FOR 'SQL*Net message from client' wait
Blockers
~~~~~~~~
Above IS a list OF ALL the processes. IF they are waiting FOR a resource
THEN it will be given IN square brackets. Below IS a summary OF the
waited upon resources, together WITH the holder OF that resource.
Notes:
~~~~~
o A process id OF '???' implies that the holder was NOT found IN the
systemstate.
Resource Holder State
Enqueue CF-00000000-00000000 12: waiting FOR 'rdbms ipc message'
Rcache object=0xa5414098, ??? Blocker
Object Names
~~~~~~~~~~~~
Enqueue CF-00000000-00000000
Rcache object=0xa5414098,

With this information, you can investigate further into the System state dump by looking at rcache object 0xa5414098. The “State Objects” are a hierarchical structure. Every state object has an adress “SO: 0xa6a5955c” and an owner, which is the parent entry. In this case, you can construct a hierarchical tree of state objects and find dependencies.

 SO: 0xa6a5995c, TYPE: 2, owner: (nil), flag: INIT/-/-/0x00
 (process) Oracle pid=52, calls cur/top: 0x9d62eb1c/0x9d5ccfcc, flag: (0) -
		   INT error: 0, CALL error: 0, sess error: 0, txn error 0
 (post info) LAST post received: 0 0 163
			 LAST post received-location: kqrbtm
			 LAST process TO post me: a6a4e1dc 1 6
			 LAST post sent: 0 0 24
			 LAST post sent-location: ksasnd
			 LAST process posted BY me: a6a4e1dc 1 6
   (latch info) wait_event=0 bits=0
	Process GROUP: DEFAULT, pseudo proc: 0xa6baf79c
	O/S info: USER: oracle, term: UNKNOWN, ospid: 29032
    OSD pid info: Unix process pid: 29032, image: oracle@ora-vm1
    Short stack dump: ksdxfstk()+19<-ksdxcb()+1321<-sspuser()+102<-__kernel_vsyscall()+16
 
    SO: 0xa4123d20, TYPE: 4, owner: 0xa6a5995c, flag: INIT/-/-/0x00
    (SESSION) sid: 955 trans: (nil), creator: 0xa6a5995c, flag: (41) USR/- BSY/-/-/-/-/-
              DID: 0001-0034-00002227, short-term DID: 0001-0034-00002228
              txn branch: (nil)
              oct: 3, prv: 0, SQL: 0x8fee9d44, psql: 0x8fee9f00, USER: 23/DBSNMP
     service name: SYS$USERS
     O/S info: USER: oracle, term: , ospid: 28849, machine: ora-vm1
               program: emagent@ora-vm1(TNS V1-V3)
     application name: emagent@ora-vm1(TNS V1-V3), hash VALUE=2420928497
     waiting FOR 'row cache lock' blocking sess=0x(nil) seq=127 wait_time=0 seconds since wait started=0
                 cache id=d, mode=0, request=5
     Dumping SESSION Wait History
      FOR 'row cache lock' COUNT=1 wait_time=1337230
		         cache id=d, mode=0, request=5
      FOR 'row cache lock' COUNT=1 wait_time=2930561
                 cache id=d, mode=0, request=5
      FOR 'row cache lock' COUNT=1 wait_time=2929106
                 cache id=d, mode=0, request=5
      FOR 'row cache lock' COUNT=1 wait_time=2933326
 
      SO: 0x9d5ccfcc, TYPE: 3, owner: 0xa6a5995c, flag: INIT/-/-/0x00
     (CALL) sess: cur a4123d20, rec a4123d20, usr a4123d20; depth: 0
       ----------------------------------------
       SO: 0x9d51b1a0, TYPE: 3, owner: 0x9d5ccfcc, flag: INIT/-/-/0x00
       (CALL) sess: cur a4123d20, rec a417f990, usr a4123d20; depth: 1
         ----------------------------------------
         SO: 0x9d69d7f0, TYPE: 3, owner: 0x9d51b1a0, flag: INIT/-/-/0x00
         (CALL) sess: cur a417f990, rec a417f990, usr a4123d20; depth: 2
         ----------------------------------------
           SO: 0x9d5e547c, TYPE: 3, owner: 0x9d69d7f0, flag: INIT/-/-/0x00
           (CALL) sess: cur a417f990, rec a417f990, usr a4123d20; depth: 3
           ----------------------------------------
             SO: 0x9d62eb1c, TYPE: 3, owner: 0x9d5e547c, flag: INIT/-/-/0x00
             (CALL) sess: cur a417f990, rec a417f990, usr a4123d20; depth: 4
             ----------------------------------------
               SO: 0x9fedf748, TYPE: 40, owner: 0x9d62eb1c, flag: INIT/-/-/0x00
               (trans) flg = 0x21, flg2 = 0x80000, prx = 0x0, ros = 2147483647 bsn = 0x395 bndsn = 0x395 spn = 0x395
               efd = 34
               parent xid: 0x0000.000.00000000
               env: (scn: 0x0000.00000000 xid: 0x0000.000.00000000 uba: 0x00000000.0000.00 statement num=0 parent xid: xid:
               0x0000.000.00000000 scn: 0x0000.00000000 0sch: scn: 0x0000.00000000)
               cev: (spc = 0 arsp = 0 ubk tsn: 0 rdba: 0x00000000 useg tsn: 0 rdba: 0x00000000
                     hwm uba: 0x00000000.0000.00 col uba: 0x00000000.0000.00
                     num bl: 0 bk list: 0x0)
                     cr opc: 0x0 spc: 0 uba: 0x00000000.0000.00
               (enqueue) TX-00010011-00D3B4CF DID: 0001-0051-00001323
               lv: 7c ac ad 4d 03 00 00 00 00 00 00 00 00 00 00 00 res_flag: 0x45
               res: 0xa447bb2c, lock_flag: 0x0
               own: 0xa4157e90, sess: 0xa4157e90, prv: 0x9fedf774
               slk: (nil)
                xga: 0x0, heap: UGA
               Trans IMU st: 0 Pool INDEX 65535, Redo pool 0x9fedfb74, Undo pool 0x9fedfbf0
               Redo pool range [0x40ed57e8 0x40ed57e8 0x40ed7fe8]
               Undo pool range [0x40ed2fe8 0x40ed2fe8 0x40ed57e8]
               ----------------------------------------
                  SO: 0x9d42237c, TYPE: 50, owner: 0x9fedf748, flag: INIT/-/-/0x00
                  ROW cache enqueue: COUNT=1 SESSION=0xa4123d20 object=0xa5414098, request=X
                  SAVEPOINT=0x395
                  ROW cache parent object: address=0xa5414098 cid=13(dc_sequences)
                  hash=9649e7e typ=9 TRANSACTION=(nil) flags=00000000
                  own=0xa5414104[0xa5414104,0xa5414104] wat=0xa541410c[0x9d422398,0x9d422398] mode=N
                  STATUS=-/-/-/-/-/-/-/-/-
                  request=X release=FALSE flags=2
                  instance LOCK id=QN 010cdbb3 b3b24847

Tanel Poder showed in his seminar to use “grep -A 3 SO:” to find all lines containing the State Object Adresses and the following 3 lines to get the tree.



Recommended Patches for Oracle Databases / Clusters

From time to time, the question arises about which patches should be installed on top of the regular patchsets, e.g. 10.2.0.4.

MetaLink Note 743554.1 finally gives a clear statement:

Patch 7312511 is recommended to apply whether you are hitting the above mentioned bugs or not. Hence this has been included in Metalink 10.2.0.4 recommended patches list.

In my opinion, especially if you have RAC systems, you should definitely review the CRS Patch Bundles and RAC Recommended Patches, test them in a test environment and bring the into production. Especially, when upgrading from a previous release, e.g. 10.2.0.3, I would recommend to upgrade to the latest possible patchset, CRS patch bundle, RDBMS patch bundle and RAC recommended patches in one step. This way, you can use one test cycle for everything instead of testing the latest patchset, e.g. 10.2.0.4 and testing seperately again every other patch.

On Linux x86_64, there are currently the following latest bundles available:

  • 7493592 – Patch 10.2.0.4 CRS Bundle #2 (24.11.2008)
  • 7573282 – Patch 10.2.0.4 RAC Recommended Bundle #2 (21.11.2008)
  • 7496435 – Patch 10.2.0.4 Generic Recommended Bundle #2 (24.10.2008)