HP-UX

Session waiting for “enq: RO – fast object reuse” – DBWR Process spinning on CPU

I have encountered the following problem on a 10.2.0.4 database on Linux x86_64 today:
A user session has been waiting for “enq: RO – fast object reuse” for almost 60 minutes while executing a “truncate table” SQL statement.

SQL> select username, event, sql_id, taddr, last_call_et from v$session where sid = 234;

USERNAME EVENT SQL_ID TADDR LAST_CALL_ET
———- —————————– ————- —————- ————
MD enq: RO – fast object reuse ljk299jlkj003 0000000153264570 3542

SQL> select sql_text from v$sqlstats where sql_id = ‘ljk299jlkj003′;

SQL_TEXT
————————————-
truncate table tab1

The Session was blocked by the CKPT process:

SQL> select * from dba_waiters;

WAITING_SESSION HOLDING_SESSION LOCK_TYPE MODE_HELD MODE_REQUESTED LOCK_ID1 LOCK_ID2
————— ————— ————————– —————————————- —————————————- ———- ———-
234 423 RO Row-S (SS) Exclusive 65573 1

SQL> select sid, serial#, sql_id, last_call_et, machine, program, username from v$session where sid = 423;

SID SERIAL# SQL_ID LAST_CALL_ET MACHINE PROGRAM
———- ———- ————- ———— —————- ——————————–
423 1 4133636 ora-vm1.intra oracle@ora-vm1.intra (CKPT)

The checkpoint process was waiting for database writer DBWR process, which was spinning on one cpu:

top

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
10712 oracle 25 0 2201m 1.7g 1.7g R 99.5 21.7 108:18.03 oracle

PID 10712 maps to DBW0:

[oracle@ora-vm1 ]$ ps -ef|grep 10712
oracle 10712 1 0 2008 ? 03:23:05 ora_dbw0_MDDB01

mpstat

Linux 2.6.9-78.ELsmp (ora-vm1.intra) 01/20/2009

02:21:56 PM CPU %user %nice %system %iowait %irq %soft %idle intr/s
02:21:57 PM all 49.75 0.00 0.00 0.00 0.00 0.00 50.25 1055.00
02:21:57 PM 0 0.00 0.00 0.00 0.00 0.00 0.00 100.00 1006.00
02:21:57 PM 1 100.00 0.00 0.00 0.00 0.00 0.00 0.00 49.00

02:21:57 PM CPU %user %nice %system %iowait %irq %soft %idle intr/s
02:21:58 PM all 50.75 0.00 0.00 0.50 0.00 0.00 48.76 1161.00
02:21:58 PM 0 1.00 0.00 0.00 1.00 0.00 0.00 98.00 1087.00
02:21:58 PM 1 100.00 0.00 0.00 0.00 0.00 0.00 0.00 74.00

The stack of dbw0 during the time showed these signatures:

[oracle@ora-vm1 oracle]$ pstack 10712
#0 0×000000000074b7fb in kslfre ()
#1 0×00000000010ccc3b in kcbo_exam_buf ()
#2 0×00000000010d0d62 in kcbo_service_ockpt ()
#3 0×0000000001080cd7 in kcbbdrv ()
#4 0×00000000007ddcc2 in ksbabs ()
#5 0×00000000007e4b32 in ksbrdp ()
#6 0×0000000002efcb50 in opirip ()
#7 0×00000000012da326 in opidrv ()
#8 0×0000000001e62456 in sou2o ()
#9 0×00000000006d2555 in opimai_real ()
#10 0×00000000006d240c in main ()
[oracle@ora-vm1 oracle]$ pstack 10712
#0 0×000000000074b36d in kslfre ()
#1 0×00000000010cc203 in kcbo_write_process ()
#2 0×00000000010ce608 in kcbo_write_q ()
#3 0×0000000001080a6d in kcbbdrv ()
#4 0×00000000007ddcc2 in ksbabs ()
#5 0×00000000007e4b32 in ksbrdp ()
#6 0×0000000002efcb50 in opirip ()
#7 0×00000000012da326 in opidrv ()
#8 0×0000000001e62456 in sou2o ()
#9 0×00000000006d2555 in opimai_real ()
#10 0×00000000006d240c in main ()
[oracle@ora-vm1 oracle]$ pstack 10712
#0 0×00000000010ccb60 in kcbo_exam_buf ()
#1 0×00000000010d0d62 in kcbo_service_ockpt ()
#2 0×0000000001080cd7 in kcbbdrv ()
#3 0×00000000007ddcc2 in ksbabs ()
#4 0×00000000007e4b32 in ksbrdp ()
#5 0×0000000002efcb50 in opirip ()
#6 0×00000000012da326 in opidrv ()
#7 0×0000000001e62456 in sou2o ()
#8 0×00000000006d2555 in opimai_real ()
#9 0×00000000006d240c in main ()
[oracle@ora-vm1 oracle]$ pstack 10712
#0 0×00000000010d0da5 in kcbo_service_ockpt ()
#1 0×0000000001080cd7 in kcbbdrv ()
#2 0×00000000007ddcc2 in ksbabs ()
#3 0×00000000007e4b32 in ksbrdp ()
#4 0×0000000002efcb50 in opirip ()
#5 0×00000000012da326 in opidrv ()
#6 0×0000000001e62456 in sou2o ()
#7 0×00000000006d2555 in opimai_real ()
#8 0×00000000006d240c in main ()

A MetaLink Research for the term “kcbo_service_ockpt” leads to Bug 7376934, which is a duplicate of Bug 7385253 – DBWR IS CONSUMING HIGH CPU.

Patch 7385253 is available for Linux x86_64, HP-UX, Solaris, AIX.
Reference:
MetaLink Note 762085.1 – Subject: ‘enq: RO – fast object reuse’ contention when gathering schema/table statistics in parallel



NUMA enabled in 10.2.0.4

When upgrading from pre 10.2.0.4 to 10.2.0.4, Oracle enables NUMA support. This has the effect that there can be multiple shared memory segments (MetaLink Note: 429872.1) although shmmax/shmall are set to high values.

I have read MetaLink Notes (7171446.8, 6730567.8, 6689903.8) and this blog entry, where a customer had problems on HP-UX with the default NUMA settings.

Better than that, it can also lead to instance crashes in 10.2.0.4 as reported in MetaLink Note 743191.1. Good news is that there is a patch available for Linux x86_64/10.2.0.4.

I have asked Oracle Support whether it is safe to leave NUMA enabled for Linux Itanium, but they would not comment on it. Instead they asked me to check with the OS vendor. Great. ;-(



Optimal VxFS Settings for Oracle Filesystems?

I found a funny presentation about Oracle filesystem stuff. It explains inode locking: Oracle Filesystems

A friend has pointed me to a very informative document from HP about the optimal settings of vxfs (HP OnlineJFS) filesystem-related parameters for Oracle: HP-UX JFS mount options for Oracle Database environments



Filesystem IO Monitoring with HP-UX Glance

On HP-UX, you can use glance to collect a huge range of OS Statistics and write them in a defined time interval into a file for later analysis. A collection of available metrics is available on the system in /opt/perf/paperdocs/gp/C/metrics.txt or /opt/perf/paperdocs/gp/C/metrics.pdf.

This is an example of IO Monitoring for Filesytems:

glance_filesystem.sh:

nohup glance -aos ./filesystem_advisor.conf -j 60 > glance_output_filesystem_$$.txt 2>/dev/null &

filesystem_advisor.conf:

headersprinted=headersprinted
if headersprinted != 1 then {
print "DATE       TIME     FILESYSTEM                   FIR     LIR     LRBR     LRR     LWBR      LWR     PIR     PRBR     PRR     PWBR     PWR"
headersprinted = 1
}
filesystem loop
{
print GBL_STATDATE|12," ",
GBL_STATTIME|9," ",
FS_DIRNAME|24,
FS_FILE_IO_RATE|8,
FS_LOGL_IO_RATE|8,
FS_LOGL_READ_BYTE_RATE|9,
FS_LOGL_READ_RATE|8,
FS_LOGL_WRITE_BYTE_RATE|9,
FS_LOGL_WRITE_RATE|8,
FS_PHYS_IO_RATE|8,
FS_PHYS_READ_BYTE_RATE|9,
FS_PHYS_READ_RATE|8,
FS_PHYS_WRITE_BYTE_RATE|9,
FS_PHYS_WRITE_RATE|8
}

Used Metrics:

FS_FILE_IO_RATE (FIR)

The number of file system related physical IOs per second directed to this file system during the interval. This value is similar to the values returned by the vmstat -d command except that vmstat reports all IOs and does not break them out by file system. Also, vmstat reports IOs from the kernel’s view, which may get broken down by the disk driver into multiple physical IOs. Since this metric reports values from the disk driver’s point of view, it is more accurate than vmstat.

FS_LOGL_IO_RATE (LIR)

The number of logical IOs per second directed to this file system during the interval. Logical IOs are generated by calling the read() or write() system calls.

FS_LOGL_READ_BYTE_RATE (LRBR)

The number of logical read KBs per second from this file system during the interval.

FS_LOGL_READ_RATE (LRR)

The number of logical reads per second directed to this file system during the interval. Logical reads are generated by calling the read() system call.

FS_LOGL_WRITE_BYTE_RATE (LWBR)

The number of logical writes KBs per second to this file system during the interval.

FS_LOGL_WRITE_RATE (LWR)

The number of logical writes per second directed to this file system during the interval. Logical writes are generated by calling the write() system call.

FS_PHYS_IO_RATE (PIR)

The number of physical IOs per second directed to this file system during the interval.

FS_PHYS_READ_BYTE_RATE (PRBR)

The number of physical KBs per second read from this file system during the interval.

FS_PHYS_READ_RATE (PRR)

The number of physical reads per second directed to this file system during the interval. On Unix systems, physical reads are generated by user file access, virtual memory access (paging), file system management, or raw device access.

FS_PHYS_WRITE_BYTE_RATE (PWBR)

The number of physical KBs per second written to this file system during the interval.

FS_PHYS_WRITE_RATE (PWR)

The number of physical writes per second directed to this file system during the interval.