Tested for Oracle Database 11g Performance Tuning Certified Expert

Last week, I decided to try the newest Oracle 11g certification –

1Z0-054 (1Z1-054): Oracle Database 11g Performance Tuning Certified Expert.

The exam is still in beta, which means that there are LOTS of questions (192 to be precise) in 3 hours. Unfortunately, this was not enough time for me. I had to skip 5-10 text-intensive questions to make it all the way to the end.

I can only recommend to practice a lot with the new 11g Performance Features as well as DB Control.

Unfortunately, now the waiting is only beginning because results won´t be available until 10 weeks after the beta close, which is extended until 30 June 2009. 😉

I might have to try it again after getting some practical experience on the first 11g production database.

Oracle Enterprise Manager – Grid Control 10.2.0.5 finally released

Finally, Oracle Enterprise Manager – Grid Control 10.2.0.5 is available for download for Win32 and Linux x86. Moreover, this is the first release to officially support 11.1.0.7 as repository database as well as the first release to officially support RHEL5 for OMS.

Relevant documents for upgrade are available on MetaLink:

  • 763307.1: How to Install the 10.2.0.5.0 Grid Control Patchset on a Well Maintained OMS or Agent
  • List of Bugs Fixed: http://download.oracle.com/docs/cd/B16240_01/doc/doc.102/e14226/toc.htm
  • 763351.1: Documentation Reference for Grid Control 10.2.0.5.0 Installation and Upgrade
  • 464674.1: Checklist for EM 10g Grid Control 10.2.x to 10.2.0.4/10.2.0.5 OMS and Repository Upgrades

To be continued….

Session waiting for “enq: RO – fast object reuse” – DBWR Process spinning on CPU

I have encountered the following problem on a 10.2.0.4 database on Linux x86_64 today:
A user session has been waiting for “enq: RO – fast object reuse” for almost 60 minutes while executing a “truncate table” SQL statement.

SQL> select username, event, sql_id, taddr, last_call_et from v$session where sid = 234;

USERNAME EVENT SQL_ID TADDR LAST_CALL_ET
———- —————————– ————- —————- ————
MD enq: RO – fast object reuse ljk299jlkj003 0000000153264570 3542

SQL> select sql_text from v$sqlstats where sql_id = ‘ljk299jlkj003’;

SQL_TEXT
————————————-
truncate table tab1

The Session was blocked by the CKPT process:

SQL> select * from dba_waiters;

WAITING_SESSION HOLDING_SESSION LOCK_TYPE MODE_HELD MODE_REQUESTED LOCK_ID1 LOCK_ID2
————— ————— ————————– —————————————- —————————————- ———- ———-
234 423 RO Row-S (SS) Exclusive 65573 1

SQL> select sid, serial#, sql_id, last_call_et, machine, program, username from v$session where sid = 423;

SID SERIAL# SQL_ID LAST_CALL_ET MACHINE PROGRAM
———- ———- ————- ———— —————- ——————————–
423 1 4133636 ora-vm1.intra oracle@ora-vm1.intra (CKPT)

The checkpoint process was waiting for database writer DBWR process, which was spinning on one cpu:

top

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
10712 oracle 25 0 2201m 1.7g 1.7g R 99.5 21.7 108:18.03 oracle

PID 10712 maps to DBW0:

[oracle@ora-vm1 ]$ ps -ef|grep 10712
oracle 10712 1 0 2008 ? 03:23:05 ora_dbw0_MDDB01

mpstat

Linux 2.6.9-78.ELsmp (ora-vm1.intra) 01/20/2009

02:21:56 PM CPU %user %nice %system %iowait %irq %soft %idle intr/s
02:21:57 PM all 49.75 0.00 0.00 0.00 0.00 0.00 50.25 1055.00
02:21:57 PM 0 0.00 0.00 0.00 0.00 0.00 0.00 100.00 1006.00
02:21:57 PM 1 100.00 0.00 0.00 0.00 0.00 0.00 0.00 49.00

02:21:57 PM CPU %user %nice %system %iowait %irq %soft %idle intr/s
02:21:58 PM all 50.75 0.00 0.00 0.50 0.00 0.00 48.76 1161.00
02:21:58 PM 0 1.00 0.00 0.00 1.00 0.00 0.00 98.00 1087.00
02:21:58 PM 1 100.00 0.00 0.00 0.00 0.00 0.00 0.00 74.00

The stack of dbw0 during the time showed these signatures:

[oracle@ora-vm1 oracle]$ pstack 10712
#0 0x000000000074b7fb in kslfre ()
#1 0x00000000010ccc3b in kcbo_exam_buf ()
#2 0x00000000010d0d62 in kcbo_service_ockpt ()
#3 0x0000000001080cd7 in kcbbdrv ()
#4 0x00000000007ddcc2 in ksbabs ()
#5 0x00000000007e4b32 in ksbrdp ()
#6 0x0000000002efcb50 in opirip ()
#7 0x00000000012da326 in opidrv ()
#8 0x0000000001e62456 in sou2o ()
#9 0x00000000006d2555 in opimai_real ()
#10 0x00000000006d240c in main ()
[oracle@ora-vm1 oracle]$ pstack 10712
#0 0x000000000074b36d in kslfre ()
#1 0x00000000010cc203 in kcbo_write_process ()
#2 0x00000000010ce608 in kcbo_write_q ()
#3 0x0000000001080a6d in kcbbdrv ()
#4 0x00000000007ddcc2 in ksbabs ()
#5 0x00000000007e4b32 in ksbrdp ()
#6 0x0000000002efcb50 in opirip ()
#7 0x00000000012da326 in opidrv ()
#8 0x0000000001e62456 in sou2o ()
#9 0x00000000006d2555 in opimai_real ()
#10 0x00000000006d240c in main ()
[oracle@ora-vm1 oracle]$ pstack 10712
#0 0x00000000010ccb60 in kcbo_exam_buf ()
#1 0x00000000010d0d62 in kcbo_service_ockpt ()
#2 0x0000000001080cd7 in kcbbdrv ()
#3 0x00000000007ddcc2 in ksbabs ()
#4 0x00000000007e4b32 in ksbrdp ()
#5 0x0000000002efcb50 in opirip ()
#6 0x00000000012da326 in opidrv ()
#7 0x0000000001e62456 in sou2o ()
#8 0x00000000006d2555 in opimai_real ()
#9 0x00000000006d240c in main ()
[oracle@ora-vm1 oracle]$ pstack 10712
#0 0x00000000010d0da5 in kcbo_service_ockpt ()
#1 0x0000000001080cd7 in kcbbdrv ()
#2 0x00000000007ddcc2 in ksbabs ()
#3 0x00000000007e4b32 in ksbrdp ()
#4 0x0000000002efcb50 in opirip ()
#5 0x00000000012da326 in opidrv ()
#6 0x0000000001e62456 in sou2o ()
#7 0x00000000006d2555 in opimai_real ()
#8 0x00000000006d240c in main ()

A MetaLink Research for the term “kcbo_service_ockpt” leads to Bug 7376934, which is a duplicate of Bug 7385253 – DBWR IS CONSUMING HIGH CPU.

Patch 7385253 is available for Linux x86_64, HP-UX, Solaris, AIX.
Reference:
MetaLink Note 762085.1 – Subject: ‘enq: RO – fast object reuse’ contention when gathering schema/table statistics in parallel

Huge Space Consumption by $ORACLE_HOME/.patch_storage

If you are keeping your system up to date with Patchsets, Patch Bundles, Merge Label Requests (MLR) or Critical Patch Updates (CPU), you will most likely suffer from a huge .patch_storage Subdirectory in your $ORACLE_HOME.

On one of my databases it looked like this:

Space used by $ORACLE_HOME: 7 GB
Space used by $ORACLE_HOME/.patch_storage: 4.3 GB

Can data in this directory be removed?

MetaLink Note 550522.1 (Subject: How To Avoid Disk Full Issues Because OPatch Backups Take Big Amount Of Disk Space.) has the answer and tells you: it depends. Normally, this data is used in order to be able to rollback a patch. However, if you have installed a patchset (eg. 10.2.0.4), then the patches for the previous patchset (10.2.0.3) which are located in the .patch_storage directory are not needed anymore and can be removed. However, I would not recommend that you delete the directories manually yourself, as this would not be supported. Instead let Oracle do it for you:

Recent versions of opatch (current is 10.2.0.4.5 as of January 2009) have a utility included, which removes patches not needed anymore from the .patch_storage directory. Moreover, the opatch utility creates these .patch_storage backup directories more intelligently which should result in less space wasted.

[oracle@vmhost1 ora10]$./OPatch/opatch util Cleanup
Invoking OPatch 10.2.0.4.5

Oracle Interim Patch Installer version 10.2.0.4.5
Copyright (c) 2008, Oracle Corporation. All rights reserved.

UTIL session

Oracle Home : /oracle/ora10
Central Inventory : /oracle/oraInventory
from : /var/opt/oracle/oraInst.loc
OPatch version : 10.2.0.4.5
OUI version : 10.2.0.4.0
OUI location : /oracle/ora10/oui
Log file location : /oracle/ora10/cfgtoollogs/opatch/opatch2009-01-15_17-00-51PM.log

Patch history file: /oracle/ora10/cfgtoollogs/opatch/opatch_history.txt

Invoking utility “cleanup”
OPatch will clean up ‘restore.sh,make.txt’ files and ‘rac,scratch,backup’ directories.
You will be still able to rollback patches after this cleanup.
Do you want to proceed? [y|n]
y
User Responded with: Y
Size of directory “/oracle/ora10/.patch_storage” before cleanup is 4575330012 bytes.
Size of directory “/oracle/ora10/.patch_storage” after cleanup is 188326505 bytes.

UtilSession: Backup area for restore has been cleaned up. For a complete list of files/directories
deleted, Please refer log file.

OPatch succeeded.

180 MB instead of 4 GB. I like that.

Hugepages revisited II: Be aware of kernel bugs!

It is well known that hugepages can reduce the overhead of managing memory pages of Oracle SGA by the operating system thus leading to lower system cpu utilization. I have written two blog entries regarding this topic already: Listener Coredumps on heavy load system and Hugepages revisited.

However, there is a potential risk with it: Certain kernels / platforms have bugs regarding hugepages which can lead to problems:

  • Bug 131295 – Hugepages configured on kernel boot line causes x86_64 kernel boot to fail with OOM: Fixed in RHEL3: kernel-2.4.21-40.EL
  • Bug 248954 – Oracle ASM DBWR process goes into 100% CPU spin when using hugepages on ia64 (Fixed in kernel-2.6.9-78.EL.ia64.rpm available as update for RHEL4U7)
  • RHSA-2008:1017-14: on the Itanium® architecture, setting the “vm.nr_hugepages” sysctl parameter caused a kernel stack overflow resulting in a kernel panic, and possibly stack corruption. With this fix, setting vm.nr_hugepages works correctly. Fixed with RHEL5 kernel-2.6.18-92.1.22.el5.ia64.rpm
  • RHSA-2008:1017-14: hugepages allow the Linux kernel to utilize the multiple page size capabilities of modern hardware architectures. In certain configurations, systems with large amounts of memory could fail to allocate most of this memory for hugepages even if it was free. This could result, for example, in database restart failures. Fixed with RHEL5 kernel-2.6.18-92.1.22.el5.ia64.rpm

Therefore, before enabling hugepages, I recommend to check with your OS Vendor Bug Database, test on a test system and apply recent OS upgrades first.

OEM Agents number of filedescriptors in steadily increasing

If you are running 10.2.0.4 databases and OEM Grid Control Agent 10.2.0.4 and you see that the number of filedescriptors of emagent is constantly increasing, you are hitting Bug 7031906.

The errors in the logfile look like this:

2008-06-27 14:32:28,973 Thread-301 ERROR engine: [oracle_database,MDDB2.world,health_check] :
nmeegd_GetMetricData failed : Instance Health Check initialization failed due to one of the following causes:
the owner of the EM agent process is not same as the owner of the Oracle instance processes; the owner of the
EM agent process is not part of the dba group; or the database version is not 10g (10.1.0.2) and above.
2008-06-27 14:32:28,973 Thread-301 WARN collector: Error exit. Error message: Instance Health
Check initialization failed due to one of the following causes: the owner of the EM agent process is not same as
the owner of the Oracle instance processes; the owner of the EM agent process is not part of the dba group; or
the database version is not 10g (10.1.0.2) and above.

And if you do pmap -x on the pid of emagent, then you will see dozens of lines containing hc_.dat.

Oracle tracks this issue with Bug 7031906 and there are patches available. In case you are running an earlier version of agent or database, check MetaLink Note 602633.1.

I have realized that I have included it in the document “Oracle Enterprise Manager 10gR2 – Grid Control Installation” in the secion “papers”, but unfortunately, in german only.

JDBC Pool with Implicit Connection Cache

I have finally found the chance to make my first steps in Java. Instead of “Hello World”, I wanted to test the performance effect of using or NOT using a connection pool for a Java application.

I have prepared 2 Java Testcases:

  • ConnWithIcc.java: 999 Select statements with JDBC Pooling via Implicit Connection Cache
  • Conn WithougIcc.java: 999 Select statements withough JDBC Pooling

ConnWithIcc.java:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
import java.sql.*;
import java.util.*;
import oracle.jdbc.pool.*;
import oracle.jdbc.*;
 
class ConnWithIcc
{
  public static void main(String args[]) throws InterruptedException
  {
    ResultSet rset = null;
    Connection conn = null;
    Statement stmt = null;
 
    try
    {
      // create a DataSource
      OracleDataSource ods = new OracleDataSource();
      ods.setURL("jdbc:oracle:thin:@//ora-vm1:1521/ICC");
      ods.setUser("scott");
      ods.setPassword("tiger");
      // set cache properties
      java.util.Properties prop = new java.util.Properties();
      prop.setProperty("MinLimit", "10");
      prop.setProperty("MaxLimit", "10");
 
      // set DataSource properties
      ods.setConnectionCachingEnabled(true); // be sure set to true
      ods.setConnectionCacheProperties (prop);
      ods.setConnectionCacheName("ImplicitCache01"); // this cache's name
 
      for (int i=1; i < 1000; i++)
      {
         //System.out.println("Establishing connection " +i);
 
         // get the connection
         conn = ods.getConnection();
         //System.out.println("Connected.\nPrinting query results ...\n");
         // Create a stmt
         stmt = conn.createStatement();
         // execute the query
         // rset = stmt.executeQuery( "select 1 from dual where " + i + " = " + i );
         rset = stmt.executeQuery( "select 1 from dual");
         while (rset.next())
         {
            int dualnr = rset.getInt ( 1 );
         }
         conn.close();
         rset.close();
         stmt.close();
 
 
/*
         try
         {
            Thread.currentThread().sleep(1000);
         }
         catch(InterruptedException ie)
         {
               System.err.println ("error message: " + ie.getMessage() );
               ie.printStackTrace();
               Runtime.getRuntime().exit(1);
 
         }
         //System.out.print("\f");
*/
 
      }
   }
   catch (SQLException e)
   {
      // handle the exception properly - in this case, we just
      // print a message and stack trace and exit the application
      System.err.println ("error message: " + e.getMessage() );
      e.printStackTrace();
      Runtime.getRuntime().exit(1);
   }
   finally
   {
      // close the result set, the stmt and connection.
      // ignore any exceptions since we are in the
      // finally clause.
      try
      {
         if( rset != null )
           rset.close();
         if( stmt != null )
           stmt.close();
         if( conn != null )
           conn.close();
      }
      catch ( SQLException ignored ) {ignored.printStackTrace(); }
   }
  }
}

ConnWithoutIcc.java:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
import java.sql.*;
import java.util.*;
import oracle.jdbc.pool.*;
import oracle.jdbc.*;
 
class ConnWithoutIcc
{
  public static void main(String args[]) throws InterruptedException
  {
    ResultSet rset = null;
    Connection conn = null;
    Statement stmt = null;
 
    try
    {
      // create a DataSource
      OracleDataSource ods = new OracleDataSource();
      ods.setURL("jdbc:oracle:thin:@//ora-vm1:1521/NOICC");
      ods.setUser("scott");
      ods.setPassword("tiger");
 
      // set cache properties
      //java.util.Properties prop = new java.util.Properties();
      //prop.setProperty("MinLimit", "10");
      //prop.setProperty("MaxLimit", "10");
 
      // set DataSource properties
      //ods.setConnectionCachingEnabled(true); // be sure set to true
      //ods.setConnectionCacheProperties (prop);
     // ods.setConnectionCacheName("ImplicitCache01"); // this cache's name
 
 
 
      for (int i=1; i < 1000; i++)
      {
         //System.out.println("Establishing connection " +i);
 
         // get the connection
         conn = ods.getConnection();
         //System.out.println("Connected.\nPrinting query results ...\n");
         // Create a stmt
         stmt = conn.createStatement();
         // execute the query
         // rset = stmt.executeQuery( "select 1 from dual where " + i + " = " + i );
         rset = stmt.executeQuery( "select 1 from dual");
 
         while (rset.next())
         {
            int dualnr = rset.getInt ( 1 );
         }
 
 
/*
         try
         {
            Thread.currentThread().sleep(1000);
         }
         catch(InterruptedException ie)
         {
               System.err.println ("error message: " + ie.getMessage() );
               ie.printStackTrace();
               Runtime.getRuntime().exit(1);
 
         }
*/
         //System.out.print("\f");
         conn.close();
         rset.close();
         stmt.close();
 
      }
   }
   catch (SQLException e)
   {
      // handle the exception properly - in this case, we just
      // print a message and stack trace and exit the application
      System.err.println ("error message: " + e.getMessage() );
      e.printStackTrace();
      Runtime.getRuntime().exit(1);
   }
   finally
   {
      // close the result set, the stmt and connection.
      // ignore any exceptions since we are in the
      // finally clause.
      try
      {
         if( rset != null )
           rset.close();
         if( stmt != null )
           stmt.close();
         if( conn != null )
           conn.close();
      }
      catch ( SQLException ignored ) {ignored.printStackTrace(); }
   }
  }
}

The first step is of course to put the java executable in the path and set the classpath environment variable:

1
2
3
4
5
[oracle@ora-vm1 ICC]$ echo $PATH
/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/home/oracle/bin:/u01/app/oracle/product/10.2.0/bin:/u01/app/oracle/product/10.2.0/jre/1.4.2/bin:/u01/app/oracle/product/10.2.0/jdk/bin
 
[oracle@ora-vm1 ICC]$ echo $CLASSPATH
.:/u01/app/oracle/product/10.2.0/jdbc/lib/ojdbc14.jar:/u01/app/oracle/product/10.2.0/jdbc/lib/orai18n.jar

Next, you have to compile the java files into class files:

1
2
[oracle@ora-vm1 ICC]$ javac ConnWithoutIcc.java
[oracle@ora-vm1 ICC]$ javac ConnWithIcc.java

Finally, we can execute both classes and measure the performance difference:

1
2
3
4
5
6
7
8
9
10
11
[oracle@ora-vm1 ICC]$ time java ConnWithIcc
 
real    0m1.501s
user    0m0.446s
sys     0m0.484s
 
[oracle@ora-vm1 ICC]$ time java ConnWithoutIcc
 
real    0m53.653s
user    0m1.528s
sys     0m2.561s

So, the 999 “select 1 from dual” statements take about 1.5 seconds with a jdbc connection pool and take almost 1 minute without a connection pool! Moreover, if you divide 999 statements by 53.6 seconds, you get an average of 18.6 connections per second. You can easily see that one of my 2 core´s is completely utilized by 999 establishing and tear down database connections.

[oracle@ora-vm1 ICC]$ sar -u 5 100
Linux 2.6.18-92.1.18.el5 (ora-vm1.intra)        12/23/2008

12:06:48 PM       CPU     %user     %nice   %system   %iowait    %steal     %idle
12:06:53 PM       all      0.30      0.00      1.90      2.30      0.00     95.51
12:06:58 PM       all      0.10      0.00      1.80      1.60      0.00     96.50
12:07:03 PM       all      7.51      0.00     15.92      0.70      0.00     75.88
12:07:08 PM       all     12.90      0.00     42.20      0.10      0.00     44.80
12:07:13 PM       all     13.60      0.00     40.80      0.50      0.00     45.10
12:07:18 PM       all     13.00      0.00     41.80      0.60      0.00     44.60
12:07:23 PM       all     14.09      0.00     40.16      0.40      0.00     45.35
12:07:28 PM       all     13.09      0.00     41.66      0.50      0.00     44.76
12:07:33 PM       all     13.00      0.00     41.40      0.50      0.00     45.10
12:07:38 PM       all     14.79      0.00     40.66      0.20      0.00     44.36
12:07:43 PM       all     15.02      0.00     39.84      0.30      0.00     44.84
12:07:48 PM       all     13.10      0.00     41.90      0.60      0.00     44.40
12:07:53 PM       all     14.99      0.00     40.06      0.70      0.00     44.26
12:07:58 PM       all      4.20      0.00     12.91      0.60      0.00     82.28
12:08:03 PM       all      0.30      0.00      1.50      1.00      0.00     97.20

Installation Prerequisites for Oracle Databases

More often than not, I see installations, where either shared memory parameters or ulimit settings are not set as required for an optimal Oracle database installation.

The tool RDA, commonly known as a support tool to gather diagnostic information for Oracle Support for service requests, can do these checks quite nicely.

Steps:

  • Download RDA from MetaLink Note 314422.1.
  • unzip in working directory
  • run Health Check Validation Engine of RDA
  • $ ./rda.pl -T hcve
    Processing HCVE tests ...
    Available Pre-Installation Rule Sets:
    1. Oracle Database 10g (10.1.0) PreInstall(HP-UX)
    2. Oracle Database 10g R1 (10.1.0) PreInstall (HP-UX Itanium)
    3. Oracle Database 10g R2 (10.2.0) PreInstall (HPUX)
    4. Oracle Database 11g R1 (11.1.0) PreInstall (HPUX)
    5. Oracle Application Server 10g (9.0.4) PreInstall (HP-UX)
    6. Oracle Application Server 10g R2 (10.1.2) PreInstall (HP-UX)
    7. Oracle Portal PreInstall (Generic)
    Available Post-Installation Rule Sets:
    8. Oracle Portal PostInstall (generic)
    9. Data Guard PostInstall (Generic)
    Enter the HCVE rule set number
    Hit 'Return' to accept the default (1)
    > 3

    Enter value for < Planned ORACLE_HOME location or if set >
    Hit 'Return' to accept the default ($ORACLE_HOME)
    >

    Test "Oracle Database 10g R2 (10.2.0) PreInstall (HPUX)" executed at Tue Dec 23 15:26:31 2008

    Test Results
    ~~~~~~~~~~~~

    ID NAME RESULT VALUE
    ===== ==================== ====== ========================================
    10 OS Certified? PASSED Certified with 10g R2 RDBMS
    20 User in /etc/passwd? PASSED userOK
    25 Got EXTJOB User? FAILED ExtjobNotFound
    30 Group in /etc/group? PASSED GroupOK
    40 Input ORACLE_HOME RECORD $ORACLE_HOME
    50 ORACLE_HOME Valid? PASSED OHexists
    60 O_H Permissions OK? PASSED CorrectPerms
    70 Umask Set to 022? PASSED UmaskOK
    80 LDLIBRARYPATH Unset? PASSED UnSet
    90 SHLIB_PATH Unset? PASSED UnSet
    100 Other O_Hs in PATH? PASSED NotFound
    110 oraInventory Permiss FAILED oraInventoryNotOK
    120 /tmp Adequate? PASSED TempSpaceOK
    130 Swap (in MB) RECORD 15144
    140 RAM (in MB) PASSED 14334
    150 SwapToRAM OK? PASSED SwapToRAMOK
    160 Disk Space OK? FAILED OnlySpaceForOne
    170 Kernel Parameters OK FAILED [EXECUTABLE_STACK=1] too large [MAXUP..>
    175 Links and Libs OK? PASSED AllExist
    180 Got ld,nm,ar,make? PASSED ld_nm_ar_make_found
    190 ulimits OK? PASSED ulimitOK
    200 Got OS Bundles? PASSED GOLDAPPS11iandGOLDBASE11iAdequate
    210 Got OS Patches? FAILED [PHNE_31097 or its successor PHNE_324..>
    220 Other OUI Up? PASSED NoOtherOUI

    The output file contains detailed information, about why a specific check failed.

    [PHNE_31097 or its successor PHNE_32477 or its successor PHNE_33498 or its successor PHNE_35418] not installed
    [PHSS_31221 or its successor PHSS_33263 or its successor PHSS_33944] not installed
    [PHSS_30970 or its successor PHSS_33033 or its successor PHSS_35379] not installed
    [PHSS_32508 or its successor PHSS_34411 or its successor PHSS_35099] not installed
    [PHSS_32509 or its successor PHSS_34412 or its successor PHSS_35098] not installed
    [PHSS_32510 or its successor PHSS_34413 or its successor PHSS_35100] not installed

    FAILED
    [EXECUTABLE_STACK=1] too large
    [MAXUPRC=1024] too small
    [MSGMNI=50] too small
    [MSGTQL=40] too small
    [NCSIZE=9964] too small
    [NFILE=5000] too small
    [NINODE=4844] too small
    [SHMMAX=1024000000] too small
    [MAXSWAPCHUNKS=2048] too small
    [MSGMAP=42] too small == KernelOK

Performance Monitoring of HP EVA Storage

I have been made to believe that the only way to monitor EVA performance is by doing it from the hosts attached, e.g. iostat, sar, etc.

Now, I have seen that there is in fact an excellent tool, called evaperf, which can be run on a windows host, which has to be attached to the storage. This tool can report just about every metric you can imagine. It can be saved as CSV File, then imported into a database for evluation or most of the metrics can be integrated in Windows perfmon. Moreover, there is a tool called tlviz, which can be used to display the metrics in graphical manner.

  • Host port statistic: Read Req/s, Read MB/s, Read Latency (ms), Write Req/s, Write MB/s, Write Latency (ms), Av Queue Depth
  • Virtual Disk (LUN) statistics: Read Hit Req/s, Read Hit MB/s, Read Hit Latency, Read Miss Req/s, Read Miss Data Rate, Read Miss Latency, Flush Data Rate, Mirror Data Rate, Prefetch Data Rate
  • Physical Disk statistics: Drive Queue Depth, Read Req/s, Read MB/s, Read Latency (ms), Write Req/s, Write MB/s, Write Latency (ms)
  • Host Connection statistics: Queue Depth, Busies
  • Histogram statistics: Read / write latency histogram, Transfer size histogram
  • Array Status statistics: Total Host Req/s, Total Host MB/s
  • Controller Status statistics: Controller CPU Utilization, Percent Data Transfer time

There is an very good documentation available from HP.

NUMA enabled in 10.2.0.4

When upgrading from pre 10.2.0.4 to 10.2.0.4, Oracle enables NUMA support. This has the effect that there can be multiple shared memory segments (MetaLink Note: 429872.1) although shmmax/shmall are set to high values.

I have read MetaLink Notes (7171446.8, 6730567.8, 6689903.8) and this blog entry, where a customer had problems on HP-UX with the default NUMA settings.

Better than that, it can also lead to instance crashes in 10.2.0.4 as reported in MetaLink Note 743191.1. Good news is that there is a patch available for Linux x86_64/10.2.0.4.

I have asked Oracle Support whether it is safe to leave NUMA enabled for Linux Itanium, but they would not comment on it. Instead they asked me to check with the OS vendor. Great. ;-(