Unix

Your experience with RAC Dynamic Remastering (DRM) in 10gR2?

One of my customers is having severe RAC performance issues, which appeared a dozen times so far. Each time, the performance impact lasted around 10 minutes and caused basically a hang of the application. ASH investigation revealed that the time frame of performance issues exactly matches a DRM operation of the biggest segment of the database. During the problematic time period, there are 20-50 instead of 5-10 active sessions and they are mostly waiting for gc related events: “gc buffer busy”,”gc cr block busy”, “gc cr block 2-way”, “gc current block 2-way”, “gc current request”, “gc current grant busy”, etc.

In addition, there is one single session which has wait event “kjbdrmcvtq lmon drm quiesce: ping completion” (on instance 1) and 1-3 sessions with wait event “gc remaster“. (on instance 2) The cur_obj# of the session waiting on “gc remaster” is pointing to the segment being remastered.

Does anybody have any experience with DRM problems with 10.2.0.4 on Linux Itanium?

I know that it is possible to deactive DRM, but usually it should be beneficial to have it enabled. I could not find any reports of performance impact during DRM operation on metalink. Support is involved but clueless so far.

Regards,
Martin

http://forums.oracle.com/forums/message.jspa?messageID=3447436#3447436



RAC Deadlock Detection not working on 10.2.0.4 / Linux Itanium

We recently experienced a big issue, when the production database was hung. It turned out that the database has deadlocked, but the GES did not detect the deadlock situation, so all the sessions were waiting on “enq: TX row lock contention”.

We could provide a reproducible testcase and it turned out to be bug 7014855. The bug is platform specific to Linux Itanium port and a patch is available.



Session waiting for “enq: RO – fast object reuse” – DBWR Process spinning on CPU

I have encountered the following problem on a 10.2.0.4 database on Linux x86_64 today:
A user session has been waiting for “enq: RO – fast object reuse” for almost 60 minutes while executing a “truncate table” SQL statement.

SQL> select username, event, sql_id, taddr, last_call_et from v$session where sid = 234;

USERNAME EVENT SQL_ID TADDR LAST_CALL_ET
———- —————————– ————- —————- ————
MD enq: RO – fast object reuse ljk299jlkj003 0000000153264570 3542

SQL> select sql_text from v$sqlstats where sql_id = ‘ljk299jlkj003′;

SQL_TEXT
————————————-
truncate table tab1

The Session was blocked by the CKPT process:

SQL> select * from dba_waiters;

WAITING_SESSION HOLDING_SESSION LOCK_TYPE MODE_HELD MODE_REQUESTED LOCK_ID1 LOCK_ID2
————— ————— ————————– —————————————- —————————————- ———- ———-
234 423 RO Row-S (SS) Exclusive 65573 1

SQL> select sid, serial#, sql_id, last_call_et, machine, program, username from v$session where sid = 423;

SID SERIAL# SQL_ID LAST_CALL_ET MACHINE PROGRAM
———- ———- ————- ———— —————- ——————————–
423 1 4133636 ora-vm1.intra oracle@ora-vm1.intra (CKPT)

The checkpoint process was waiting for database writer DBWR process, which was spinning on one cpu:

top

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
10712 oracle 25 0 2201m 1.7g 1.7g R 99.5 21.7 108:18.03 oracle

PID 10712 maps to DBW0:

[oracle@ora-vm1 ]$ ps -ef|grep 10712
oracle 10712 1 0 2008 ? 03:23:05 ora_dbw0_MDDB01

mpstat

Linux 2.6.9-78.ELsmp (ora-vm1.intra) 01/20/2009

02:21:56 PM CPU %user %nice %system %iowait %irq %soft %idle intr/s
02:21:57 PM all 49.75 0.00 0.00 0.00 0.00 0.00 50.25 1055.00
02:21:57 PM 0 0.00 0.00 0.00 0.00 0.00 0.00 100.00 1006.00
02:21:57 PM 1 100.00 0.00 0.00 0.00 0.00 0.00 0.00 49.00

02:21:57 PM CPU %user %nice %system %iowait %irq %soft %idle intr/s
02:21:58 PM all 50.75 0.00 0.00 0.50 0.00 0.00 48.76 1161.00
02:21:58 PM 0 1.00 0.00 0.00 1.00 0.00 0.00 98.00 1087.00
02:21:58 PM 1 100.00 0.00 0.00 0.00 0.00 0.00 0.00 74.00

The stack of dbw0 during the time showed these signatures:

[oracle@ora-vm1 oracle]$ pstack 10712
#0 0x000000000074b7fb in kslfre ()
#1 0x00000000010ccc3b in kcbo_exam_buf ()
#2 0x00000000010d0d62 in kcbo_service_ockpt ()
#3 0x0000000001080cd7 in kcbbdrv ()
#4 0x00000000007ddcc2 in ksbabs ()
#5 0x00000000007e4b32 in ksbrdp ()
#6 0x0000000002efcb50 in opirip ()
#7 0x00000000012da326 in opidrv ()
#8 0x0000000001e62456 in sou2o ()
#9 0x00000000006d2555 in opimai_real ()
#10 0x00000000006d240c in main ()
[oracle@ora-vm1 oracle]$ pstack 10712
#0 0x000000000074b36d in kslfre ()
#1 0x00000000010cc203 in kcbo_write_process ()
#2 0x00000000010ce608 in kcbo_write_q ()
#3 0x0000000001080a6d in kcbbdrv ()
#4 0x00000000007ddcc2 in ksbabs ()
#5 0x00000000007e4b32 in ksbrdp ()
#6 0x0000000002efcb50 in opirip ()
#7 0x00000000012da326 in opidrv ()
#8 0x0000000001e62456 in sou2o ()
#9 0x00000000006d2555 in opimai_real ()
#10 0x00000000006d240c in main ()
[oracle@ora-vm1 oracle]$ pstack 10712
#0 0x00000000010ccb60 in kcbo_exam_buf ()
#1 0x00000000010d0d62 in kcbo_service_ockpt ()
#2 0x0000000001080cd7 in kcbbdrv ()
#3 0x00000000007ddcc2 in ksbabs ()
#4 0x00000000007e4b32 in ksbrdp ()
#5 0x0000000002efcb50 in opirip ()
#6 0x00000000012da326 in opidrv ()
#7 0x0000000001e62456 in sou2o ()
#8 0x00000000006d2555 in opimai_real ()
#9 0x00000000006d240c in main ()
[oracle@ora-vm1 oracle]$ pstack 10712
#0 0x00000000010d0da5 in kcbo_service_ockpt ()
#1 0x0000000001080cd7 in kcbbdrv ()
#2 0x00000000007ddcc2 in ksbabs ()
#3 0x00000000007e4b32 in ksbrdp ()
#4 0x0000000002efcb50 in opirip ()
#5 0x00000000012da326 in opidrv ()
#6 0x0000000001e62456 in sou2o ()
#7 0x00000000006d2555 in opimai_real ()
#8 0x00000000006d240c in main ()

A MetaLink Research for the term “kcbo_service_ockpt” leads to Bug 7376934, which is a duplicate of Bug 7385253 – DBWR IS CONSUMING HIGH CPU.

Patch 7385253 is available for Linux x86_64, HP-UX, Solaris, AIX.
Reference:
MetaLink Note 762085.1 – Subject: ‘enq: RO – fast object reuse’ contention when gathering schema/table statistics in parallel



Hugepages revisited II: Be aware of kernel bugs!

It is well known that hugepages can reduce the overhead of managing memory pages of Oracle SGA by the operating system thus leading to lower system cpu utilization. I have written two blog entries regarding this topic already: Listener Coredumps on heavy load system and Hugepages revisited.

However, there is a potential risk with it: Certain kernels / platforms have bugs regarding hugepages which can lead to problems:

  • Bug 131295 – Hugepages configured on kernel boot line causes x86_64 kernel boot to fail with OOM: Fixed in RHEL3: kernel-2.4.21-40.EL
  • Bug 248954 – Oracle ASM DBWR process goes into 100% CPU spin when using hugepages on ia64 (Fixed in kernel-2.6.9-78.EL.ia64.rpm available as update for RHEL4U7)
  • RHSA-2008:1017-14: on the Itanium® architecture, setting the “vm.nr_hugepages” sysctl parameter caused a kernel stack overflow resulting in a kernel panic, and possibly stack corruption. With this fix, setting vm.nr_hugepages works correctly. Fixed with RHEL5 kernel-2.6.18-92.1.22.el5.ia64.rpm
  • RHSA-2008:1017-14: hugepages allow the Linux kernel to utilize the multiple page size capabilities of modern hardware architectures. In certain configurations, systems with large amounts of memory could fail to allocate most of this memory for hugepages even if it was free. This could result, for example, in database restart failures. Fixed with RHEL5 kernel-2.6.18-92.1.22.el5.ia64.rpm

Therefore, before enabling hugepages, I recommend to check with your OS Vendor Bug Database, test on a test system and apply recent OS upgrades first.



NUMA enabled in 10.2.0.4

When upgrading from pre 10.2.0.4 to 10.2.0.4, Oracle enables NUMA support. This has the effect that there can be multiple shared memory segments (MetaLink Note: 429872.1) although shmmax/shmall are set to high values.

I have read MetaLink Notes (7171446.8, 6730567.8, 6689903.8) and this blog entry, where a customer had problems on HP-UX with the default NUMA settings.

Better than that, it can also lead to instance crashes in 10.2.0.4 as reported in MetaLink Note 743191.1. Good news is that there is a patch available for Linux x86_64/10.2.0.4.

I have asked Oracle Support whether it is safe to leave NUMA enabled for Linux Itanium, but they would not comment on it. Instead they asked me to check with the OS vendor. Great. ;-(



Installing 10gR2 RAC on Linux Itanium (Montecito)

Recently, I had to install 10gR2 on Linux Itanium (Montecito CPUs) and found out that the Java version that ships with the binaries does not work on this platform. Therefore you have to download Patch 5390722 and perform the following steps for RAC installation:

  1. Install Patch 5390722: Install JDK into new 10.2 CRS Home, then install JRE into new 10.2 CRS Home.
  2. Take a tar backup of the CRS Home containing these two components. You will need it.
  3. Install 10.2.0.1 Clusterware by running from 10.2.0.1 binaries: ./runInstaller -jreLoc $CRS_HOME/jre/1.4.2
  4. Install Patch 5390722 with the option CLUSTER_NODES={"node1", "node2", ...}: Install JDK into new 10.2 RDBMS Home, then install JRE into new 10.2 RDBMS
  5. Install 10.2.0.1 RDBMS Binaries into the new 10.2 RDBMS: ./runInstaller -jreLoc $ORACLE_HOME/jre/1.4.2
  6. If you want to install the 10.2.0.4 patchset, you will have to follow these steps:
    for CRS: ./runInstaller -jreLoc $ORA_CRS_HOME/jdk/jre
    for RDBMS: ./runInstaller -jreLoc $ORACLE_HOME/jdk/jre
  7. After that, you have to repair the JRE because the 10.2.0.4 patchset has overwritten the patched JRE with the defective versions. (7448301)
    % cd $ORACLE_HOME/jre
    % rm -rf 1.4.2
    % tar –xvf $ORACLE_HOME/jre/1.4.2-5390722.tar

Sources:

  • Note: 404248.1 – How To Install Oracle CRS And RAC Software On Itanium Servers With Montecito Processors
  • Note: 400227.1 – How To Install Oracle RDBMS Software On Itanium Servers With Montecito Processors
  • Bug 7448301 – Linux Itanium: 10.2.0.4 Patchset for Linux Itanium (Montecito) has wrong Java runtime


Hugepages revisited

A while ago I wrote a post about a specific listener coredump issue which could be solved by 1) installing an oracle patch and 2) by implementing hugepages. I have also linked to an article from Pythian Group Oracle expert Riyaj Shamsudeen, who demonstrated problems with memory page management overheads with big SGAs without hugepages.

Today, I want to document the steps necessary to implement hugepages after doing some research:

  • Check your /etc/sysctl.conf shmall and shmmax values. I recommend that you set shmmax bigger or
  • Check your current total shared memory segment size. Depending on your “bc -l” skills by summing all byte lines from ipcs -m or by executing a script from metalink Note 401749.1 (which does exactly that). Calculate how many hugepages you need by “cat /proc/meminfo” and dividing it by the pagesize of your platform. (Linux IA64 has 256MB pages for example) I recommend to add 1 extra page for safety.
  • Say, you need 200 hugepages. Multiply it with the pagesize and enter this value in
    /etc/security/limits.conf: (values in kb)

oracle soft memlock 2097152
oracle hard memlock 2097152

  • Set the parameters in /etc/sysctl.conf:

vm.nr_hugepages=200
vm.hugetlb_shm_group=<group id of dba group from /etc/group>
e.g. vm.hugetlb_shm_group=201

I am going to implement hugepages on Linux Itanium for a Real Application Cluster system. I have read posts that there are different issues regarding startup with srvctl or sqlplus and startup by oracle or root. I will investigate and write more soon.



Listener Coredumps on heavy load system

Recently I have come across a system which experiences listener crashes (core dumps) every couple of days. The listener logfile shows errors shortly before core-dumping:

29-SEP-2008 03:49:07 * (CONNECT_DATA=(SID=MDDB1)(CID=(PROGRAM=)(HOST=__jdbc__)(USER=))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.0.1)(PORT
=1398)) * establish * MDDB1 * 12518
TNS-12518: TNS:listener could not hand off client connection
TNS-12571: TNS:packet writer failure
TNS-12560: TNS:protocol adapter error
TNS-00530: Protocol adapter error
Linux IA64 Error: 104: Connection reset by peer

After analysing the core dump with gdb, the stack points to malloc() calls, which mean that the listener requests memory from the OS.

gdb /oracle/ora10/bin/tnslsnr core.311
GNU gdb Red Hat Linux (6.3.0.0-1.153.el4rh)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "ia64-redhat-linux-gnu"...(no debugging symbols found)
Using host libthread_db library "/lib/tls/libthread_db.so.1".

Reading symbols from shared object read from target memory...(no debugging symbols found)...done.
Loaded system supplied DSO at 0xa000000000000000
Core was generated by `/oracle/ora10/bin/tnslsnr listenerv -inherit'.
Program terminated with signal 11, Segmentation fault.

#0 0x20000000027ee220 in malloc_consolidate ()
from /lib/tls/libc.so.6.1
(gdb) bt
#0 0x20000000027ee220 in malloc_consolidate () from /lib/tls/libc.so.6.1
#1 0x20000000027f0e30 in _int_malloc () from /lib/tls/libc.so.6.1
#2 0x20000000027f4b50 in malloc () from /lib/tls/libc.so.6.1
#3 0x40000000000079f0 in nsglconcrt ()
#4 0x4000000000011a00 in nsglhc ()
#5 0x4000000000019690 in nsglhe ()
#6 0x400000000001b980 in nsglma ()
#7 0x4000000000007770 in main ()
(gdb) quit

After contacting Oracle Support with this stack, they confirmed it to be Bug #6752308 which was closed as Duplicate of Bug 6139856. There is patch for 10.2.0.3 available and they also recommend to implement hugepages. By the way, there is an interesting article on the effect of utilizing – or not utilizing – hugepages here.

6139856 - TT11.1VALGRIND: FMR (FREE MEMORY READ/WRITE) IN NSEV.C



Optimal VxFS Settings for Oracle Filesystems?

I found a funny presentation about Oracle filesystem stuff. It explains inode locking: Oracle Filesystems

A friend has pointed me to a very informative document from HP about the optimal settings of vxfs (HP OnlineJFS) filesystem-related parameters for Oracle: HP-UX JFS mount options for Oracle Database environments



Filesystem IO Monitoring with HP-UX Glance

On HP-UX, you can use glance to collect a huge range of OS Statistics and write them in a defined time interval into a file for later analysis. A collection of available metrics is available on the system in /opt/perf/paperdocs/gp/C/metrics.txt or /opt/perf/paperdocs/gp/C/metrics.pdf.

This is an example of IO Monitoring for Filesytems:

glance_filesystem.sh:

nohup glance -aos ./filesystem_advisor.conf -j 60 > glance_output_filesystem_$$.txt 2>/dev/null &

filesystem_advisor.conf:

headersprinted=headersprinted
if headersprinted != 1 then {
print "DATE       TIME     FILESYSTEM                   FIR     LIR     LRBR     LRR     LWBR      LWR     PIR     PRBR     PRR     PWBR     PWR"
headersprinted = 1
}
filesystem loop
{
print GBL_STATDATE|12," ",
GBL_STATTIME|9," ",
FS_DIRNAME|24,
FS_FILE_IO_RATE|8,
FS_LOGL_IO_RATE|8,
FS_LOGL_READ_BYTE_RATE|9,
FS_LOGL_READ_RATE|8,
FS_LOGL_WRITE_BYTE_RATE|9,
FS_LOGL_WRITE_RATE|8,
FS_PHYS_IO_RATE|8,
FS_PHYS_READ_BYTE_RATE|9,
FS_PHYS_READ_RATE|8,
FS_PHYS_WRITE_BYTE_RATE|9,
FS_PHYS_WRITE_RATE|8
}

Used Metrics:

FS_FILE_IO_RATE (FIR)

The number of file system related physical IOs per second directed to this file system during the interval. This value is similar to the values returned by the vmstat -d command except that vmstat reports all IOs and does not break them out by file system. Also, vmstat reports IOs from the kernel’s view, which may get broken down by the disk driver into multiple physical IOs. Since this metric reports values from the disk driver’s point of view, it is more accurate than vmstat.

FS_LOGL_IO_RATE (LIR)

The number of logical IOs per second directed to this file system during the interval. Logical IOs are generated by calling the read() or write() system calls.

FS_LOGL_READ_BYTE_RATE (LRBR)

The number of logical read KBs per second from this file system during the interval.

FS_LOGL_READ_RATE (LRR)

The number of logical reads per second directed to this file system during the interval. Logical reads are generated by calling the read() system call.

FS_LOGL_WRITE_BYTE_RATE (LWBR)

The number of logical writes KBs per second to this file system during the interval.

FS_LOGL_WRITE_RATE (LWR)

The number of logical writes per second directed to this file system during the interval. Logical writes are generated by calling the write() system call.

FS_PHYS_IO_RATE (PIR)

The number of physical IOs per second directed to this file system during the interval.

FS_PHYS_READ_BYTE_RATE (PRBR)

The number of physical KBs per second read from this file system during the interval.

FS_PHYS_READ_RATE (PRR)

The number of physical reads per second directed to this file system during the interval. On Unix systems, physical reads are generated by user file access, virtual memory access (paging), file system management, or raw device access.

FS_PHYS_WRITE_BYTE_RATE (PWBR)

The number of physical KBs per second written to this file system during the interval.

FS_PHYS_WRITE_RATE (PWR)

The number of physical writes per second directed to this file system during the interval.