Unix

Oracle Database version 11g Release 2 for Linux finally released

Yesterday, on September 1st, Oracle released the much awaited database version 11gR2 for Linux x86 for both 32bit and 64bit platforms. The software can be downloaded via OTN: http://www.oracle.com/technology/software/products/database/index.html.



Out-of-Memory killer on 32bit Linux with big RAM

It is not very known that you can run into serious problems if you run Linux x86-32bit with a big amount of RAM installed, if using RHEL below 5. The official name for the issue is called “Low Memory Starvation”. The best solution is to use x86-64bit to be able to address the whole amount of RAM efficiently.

However, if that is not feasible, then make sure that you at least run the hugemem kernel if you use RHEL < 5. In RHEL5-32bit, the hugemem kernel is default. A quick demonstration about what can happen if you don´t use hugemem kernel is shown here: We realized that RMAN backup is taking more than 24 hours. Querying v$session, we find out that one session is in ACTION "STARTED", whereas the other sessions are FINISHED.

SQL> select program, module,action 
      from v$session 
      where username = 'SYS' and program like 'rman%'
/      

PROGRAM                    MODULE                       ACTION             
-------------------------- ---------------------------  -------------------
rman@ora-vm1 (TNS V1-V3)    backup full datafile        0000078 FINISHED129
rman@ora-vm1 (TNS V1-V3)    backup full datafile        0000272 STARTED16  
rman@ora-vm1 (TNS V1-V3)    backup full datafile        0000084 FINISHED129
rman@ora-vm1 (TNS V1-V3)    rman@ora-vm1 (TNS V1-V3)                       
rman@ora-vm1 (TNS V1-V3)    rman@ora-vm1 (TNS V1-V3)    0000004 FINISHED131
rman@ora-vm1 (TNS V1-V3)    backup full datafile        0000092 FINISHED129

Then we check the SID,serial# from v$session of this session and also query the UNIX PID from v$process.spid

SQL> select sid,serial# from v$session where event like 'RMAN%';

       SID    SERIAL#
---------- ----------
      4343       5837

We activate SQL Tracing for this session to determine its activity:

SQL> select spid from v$process where addr = 
   (select paddr from v$session where sid = 4343);

SPID
------------
1681

SQL> begin dbms_monitor.session_trace_enable(4343,5837,true,true);
  2  end;
  3  /

However, no trace file gets created. Then we start tracing system calls with strace:

ora-vm1:# strace -fp 1681
attach: ptrace(PTRACE_ATTACH, ...): Operation not permitted

“Not permitted”? – Although I am connected as root?

ps -ef|grep 1681
oracle    1681 1582  0 Aug24 ?        00:00:09 [oracle] <defunct>

The linux command “ps” reports the server process as “defunct”.

ora-vm1:/usr/oracle/admin/labo/udump$ ps -ef|grep 1582
oracle   1582 21578  0 Aug24 ?        00:00:02 rman oracle/product/10.2.0/bin/rman nocatalog
oracle   21663 1582  0 Aug24 ?        00:00:01 oraclelabo (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oracle   21665 1582  0 Aug24 ?        00:00:03 oraclelabo (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oracle   1681 1582   0 Aug24 ?        00:00:09 [oracle] <defunct>
oracle   21691 1582  0 Aug24 ?        00:01:36 oraclelabo (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oracle   21695 1582  0 Aug24 ?        00:01:41 oraclelabo (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oracle   21793 1582  0 Aug24 ?        00:01:30 oraclelabo (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))

Next, I checked logfile /var/log/messages.1 and realized that the kernel out-of-memory killer (OOM) killed this PID because of low memory starvation.

/var/log/messages.1:
Aug  24 22:32:44 ora-vm1 kernel: Out of Memory: Killed process 1681 (oracle).


Multipathing Configuration issue waiting to happen

Quite some time ago, I came across a quite hard to find issue during a consulting engagement, which i find worth mentioning. A 2 node RAC cluster running on RHEL4 x86-64 was relocated to a different data center. Apart from making sure, that the switch ports and Fibre Channel Ports are available on the new location, there is not much to worry about.

After the relocation, on one node the multipathing configuration, implemented with dev-mapper-multipath would not work. The command “multipath -ll” would just not return any output. After more than an hour, we pinned the issue down to the error message:

# multipath -v 3
#
# all paths in cache :
#



path sdh not found in pathvec

When checking what device sdh was, we realized that this was a KVM device, plugged in by the sysadmins.

May 27 14:46:26 host1 kernel: Attached scsi removable disk sdh at scsi10, channel 0, id 0, lun 0
May 27 14:46:26 host1 kernel: Type: Direct-Access ANSI SCSI revision: 02
May 27 14:46:26 host1 kernel: Vendor: KVM Model: vmDisk Rev: 0.01
May 27 14:46:26 host1 kernel: scsi10 : SCSI emulation for USB Mass Storage devices
May 27 14:46:26 host1 kernel: sr1: scsi3-mmc drive: 0x/0x caddy
May 27 14:46:26 host1 kernel: Type: CD-ROM ANSI SCSI revision: 02
May 27 14:46:26 host1 kernel: Vendor: KVM Model: vmDisk-CD Rev: 0.01

BTW: What is a KVM device?
Wikipedia states: A KVM switch (with KVM being an abbreviation for Keyboard, Video or Visual Display Unit, Mouse) is a hardware device that allows a user to control multiple computers from a single keyboard, video monitor and mouse.

We then added the device sdh to the multipath blacklist section in /etc/multipath.conf, and the problem was solved:

devnode_blacklist {
devnode “^sdh$”
}



Default 10gR2 RAC TNS Configuration can cause TCP Timeouts for application

The default RAC installation does normally not set “local_listener” init.ora parameter. If the listener is running on port 1521, then the database does not need the parameter in order to find and register with the local TNS listener process. However, if you have *not* set local_listener, it means that the database registers at the listener with the physical IP address instead of the virtual (vip) address.

You can determine if this happens when you take a look at “lsnrctl serv” output from your rac nodes:

Service "S_APP" has 1 instance(s).
  Instance "MDDB1", status READY, has 2 handler(s) for this service...
    Handler(s):
      "DEDICATED" established:0 refused:0 state:ready
         REMOTE SERVER
         (ADDRESS=(PROTOCOL=TCP)(HOST=ora-vm1.intra)(PORT=1521))

Instead of ora-vm1.intra, this should be ora-vm1-vip.intra.

Why should I care?

If you use the default configuration, then you are using the parameter “REMOTE_LISTENER” and therefore Server Side Connect Time Load Balancing. This means, that the listeners of all nodes receive load information from all instances of all nodes and they can redirect connections to the least loaded instance, even if the instance is on another node. But the connect string they then send back to the client contains the physical IP address instead of the virtual.

In case of node crashes or kernel panics, etc. the client has to wait for the TCP timeout until this is detected.

Solution

tnsnames.ora:

LISTENER_MDDB1 =
  (ADDRESS_LIST =
    (ADDRESS = (PROTOCOL = TCP)(HOST = ora-vm1-vip.intra)(PORT = 1521))
  )

LISTENER_MDDB2 =
  (ADDRESS_LIST =
    (ADDRESS = (PROTOCOL = TCP)(HOST = ora-vm2-vip.intra)(PORT = 1521))
  )


init.ora:
alter system set local_listener = 'LISTENER_MDDB1' sid='MDDB1';
alter system set local_listener = 'LISTENER_MDDB2' sid='MDDB2';



Your experience with RAC Dynamic Remastering (DRM) in 10gR2?

One of my customers is having severe RAC performance issues, which appeared a dozen times so far. Each time, the performance impact lasted around 10 minutes and caused basically a hang of the application. ASH investigation revealed that the time frame of performance issues exactly matches a DRM operation of the biggest segment of the database. During the problematic time period, there are 20-50 instead of 5-10 active sessions and they are mostly waiting for gc related events: “gc buffer busy”,”gc cr block busy”, “gc cr block 2-way”, “gc current block 2-way”, “gc current request”, “gc current grant busy”, etc.

In addition, there is one single session which has wait event “kjbdrmcvtq lmon drm quiesce: ping completion” (on instance 1) and 1-3 sessions with wait event “gc remaster“. (on instance 2) The cur_obj# of the session waiting on “gc remaster” is pointing to the segment being remastered.

Does anybody have any experience with DRM problems with 10.2.0.4 on Linux Itanium?

I know that it is possible to deactive DRM, but usually it should be beneficial to have it enabled. I could not find any reports of performance impact during DRM operation on metalink. Support is involved but clueless so far.

Regards,
Martin

http://forums.oracle.com/forums/message.jspa?messageID=3447436#3447436



RAC Deadlock Detection not working on 10.2.0.4 / Linux Itanium

We recently experienced a big issue, when the production database was hung. It turned out that the database has deadlocked, but the GES did not detect the deadlock situation, so all the sessions were waiting on “enq: TX row lock contention”.

We could provide a reproducible testcase and it turned out to be bug 7014855. The bug is platform specific to Linux Itanium port and a patch is available.



Session waiting for “enq: RO – fast object reuse” – DBWR Process spinning on CPU

I have encountered the following problem on a 10.2.0.4 database on Linux x86_64 today:
A user session has been waiting for “enq: RO – fast object reuse” for almost 60 minutes while executing a “truncate table” SQL statement.

SQL> select username, event, sql_id, taddr, last_call_et from v$session where sid = 234;

USERNAME EVENT SQL_ID TADDR LAST_CALL_ET
———- —————————– ————- —————- ————
MD enq: RO – fast object reuse ljk299jlkj003 0000000153264570 3542

SQL> select sql_text from v$sqlstats where sql_id = ‘ljk299jlkj003’;

SQL_TEXT
————————————-
truncate table tab1

The Session was blocked by the CKPT process:

SQL> select * from dba_waiters;

WAITING_SESSION HOLDING_SESSION LOCK_TYPE MODE_HELD MODE_REQUESTED LOCK_ID1 LOCK_ID2
————— ————— ————————– —————————————- —————————————- ———- ———-
234 423 RO Row-S (SS) Exclusive 65573 1

SQL> select sid, serial#, sql_id, last_call_et, machine, program, username from v$session where sid = 423;

SID SERIAL# SQL_ID LAST_CALL_ET MACHINE PROGRAM
———- ———- ————- ———— —————- ——————————–
423 1 4133636 ora-vm1.intra oracle@ora-vm1.intra (CKPT)

The checkpoint process was waiting for database writer DBWR process, which was spinning on one cpu:

top

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
10712 oracle 25 0 2201m 1.7g 1.7g R 99.5 21.7 108:18.03 oracle

PID 10712 maps to DBW0:

[oracle@ora-vm1 ]$ ps -ef|grep 10712
oracle 10712 1 0 2008 ? 03:23:05 ora_dbw0_MDDB01

mpstat

Linux 2.6.9-78.ELsmp (ora-vm1.intra) 01/20/2009

02:21:56 PM CPU %user %nice %system %iowait %irq %soft %idle intr/s
02:21:57 PM all 49.75 0.00 0.00 0.00 0.00 0.00 50.25 1055.00
02:21:57 PM 0 0.00 0.00 0.00 0.00 0.00 0.00 100.00 1006.00
02:21:57 PM 1 100.00 0.00 0.00 0.00 0.00 0.00 0.00 49.00

02:21:57 PM CPU %user %nice %system %iowait %irq %soft %idle intr/s
02:21:58 PM all 50.75 0.00 0.00 0.50 0.00 0.00 48.76 1161.00
02:21:58 PM 0 1.00 0.00 0.00 1.00 0.00 0.00 98.00 1087.00
02:21:58 PM 1 100.00 0.00 0.00 0.00 0.00 0.00 0.00 74.00

The stack of dbw0 during the time showed these signatures:

[oracle@ora-vm1 oracle]$ pstack 10712
#0 0x000000000074b7fb in kslfre ()
#1 0x00000000010ccc3b in kcbo_exam_buf ()
#2 0x00000000010d0d62 in kcbo_service_ockpt ()
#3 0x0000000001080cd7 in kcbbdrv ()
#4 0x00000000007ddcc2 in ksbabs ()
#5 0x00000000007e4b32 in ksbrdp ()
#6 0x0000000002efcb50 in opirip ()
#7 0x00000000012da326 in opidrv ()
#8 0x0000000001e62456 in sou2o ()
#9 0x00000000006d2555 in opimai_real ()
#10 0x00000000006d240c in main ()
[oracle@ora-vm1 oracle]$ pstack 10712
#0 0x000000000074b36d in kslfre ()
#1 0x00000000010cc203 in kcbo_write_process ()
#2 0x00000000010ce608 in kcbo_write_q ()
#3 0x0000000001080a6d in kcbbdrv ()
#4 0x00000000007ddcc2 in ksbabs ()
#5 0x00000000007e4b32 in ksbrdp ()
#6 0x0000000002efcb50 in opirip ()
#7 0x00000000012da326 in opidrv ()
#8 0x0000000001e62456 in sou2o ()
#9 0x00000000006d2555 in opimai_real ()
#10 0x00000000006d240c in main ()
[oracle@ora-vm1 oracle]$ pstack 10712
#0 0x00000000010ccb60 in kcbo_exam_buf ()
#1 0x00000000010d0d62 in kcbo_service_ockpt ()
#2 0x0000000001080cd7 in kcbbdrv ()
#3 0x00000000007ddcc2 in ksbabs ()
#4 0x00000000007e4b32 in ksbrdp ()
#5 0x0000000002efcb50 in opirip ()
#6 0x00000000012da326 in opidrv ()
#7 0x0000000001e62456 in sou2o ()
#8 0x00000000006d2555 in opimai_real ()
#9 0x00000000006d240c in main ()
[oracle@ora-vm1 oracle]$ pstack 10712
#0 0x00000000010d0da5 in kcbo_service_ockpt ()
#1 0x0000000001080cd7 in kcbbdrv ()
#2 0x00000000007ddcc2 in ksbabs ()
#3 0x00000000007e4b32 in ksbrdp ()
#4 0x0000000002efcb50 in opirip ()
#5 0x00000000012da326 in opidrv ()
#6 0x0000000001e62456 in sou2o ()
#7 0x00000000006d2555 in opimai_real ()
#8 0x00000000006d240c in main ()

A MetaLink Research for the term “kcbo_service_ockpt” leads to Bug 7376934, which is a duplicate of Bug 7385253 – DBWR IS CONSUMING HIGH CPU.

Patch 7385253 is available for Linux x86_64, HP-UX, Solaris, AIX.
Reference:
MetaLink Note 762085.1 – Subject: ‘enq: RO – fast object reuse’ contention when gathering schema/table statistics in parallel



Hugepages revisited II: Be aware of kernel bugs!

It is well known that hugepages can reduce the overhead of managing memory pages of Oracle SGA by the operating system thus leading to lower system cpu utilization. I have written two blog entries regarding this topic already: Listener Coredumps on heavy load system and Hugepages revisited.

However, there is a potential risk with it: Certain kernels / platforms have bugs regarding hugepages which can lead to problems:

  • Bug 131295 – Hugepages configured on kernel boot line causes x86_64 kernel boot to fail with OOM: Fixed in RHEL3: kernel-2.4.21-40.EL
  • Bug 248954 – Oracle ASM DBWR process goes into 100% CPU spin when using hugepages on ia64 (Fixed in kernel-2.6.9-78.EL.ia64.rpm available as update for RHEL4U7)
  • RHSA-2008:1017-14: on the Itanium® architecture, setting the “vm.nr_hugepages” sysctl parameter caused a kernel stack overflow resulting in a kernel panic, and possibly stack corruption. With this fix, setting vm.nr_hugepages works correctly. Fixed with RHEL5 kernel-2.6.18-92.1.22.el5.ia64.rpm
  • RHSA-2008:1017-14: hugepages allow the Linux kernel to utilize the multiple page size capabilities of modern hardware architectures. In certain configurations, systems with large amounts of memory could fail to allocate most of this memory for hugepages even if it was free. This could result, for example, in database restart failures. Fixed with RHEL5 kernel-2.6.18-92.1.22.el5.ia64.rpm

Therefore, before enabling hugepages, I recommend to check with your OS Vendor Bug Database, test on a test system and apply recent OS upgrades first.



NUMA enabled in 10.2.0.4

When upgrading from pre 10.2.0.4 to 10.2.0.4, Oracle enables NUMA support. This has the effect that there can be multiple shared memory segments (MetaLink Note: 429872.1) although shmmax/shmall are set to high values.

I have read MetaLink Notes (7171446.8, 6730567.8, 6689903.8) and this blog entry, where a customer had problems on HP-UX with the default NUMA settings.

Better than that, it can also lead to instance crashes in 10.2.0.4 as reported in MetaLink Note 743191.1. Good news is that there is a patch available for Linux x86_64/10.2.0.4.

I have asked Oracle Support whether it is safe to leave NUMA enabled for Linux Itanium, but they would not comment on it. Instead they asked me to check with the OS vendor. Great. ;-(



Installing 10gR2 RAC on Linux Itanium (Montecito)

Recently, I had to install 10gR2 on Linux Itanium (Montecito CPUs) and found out that the Java version that ships with the binaries does not work on this platform. Therefore you have to download Patch 5390722 and perform the following steps for RAC installation:

  1. Install Patch 5390722: Install JDK into new 10.2 CRS Home, then install JRE into new 10.2 CRS Home.
  2. Take a tar backup of the CRS Home containing these two components. You will need it.
  3. Install 10.2.0.1 Clusterware by running from 10.2.0.1 binaries: ./runInstaller -jreLoc $CRS_HOME/jre/1.4.2
  4. Install Patch 5390722 with the option CLUSTER_NODES={"node1", "node2", ...}: Install JDK into new 10.2 RDBMS Home, then install JRE into new 10.2 RDBMS
  5. Install 10.2.0.1 RDBMS Binaries into the new 10.2 RDBMS: ./runInstaller -jreLoc $ORACLE_HOME/jre/1.4.2
  6. If you want to install the 10.2.0.4 patchset, you will have to follow these steps:
    for CRS: ./runInstaller -jreLoc $ORA_CRS_HOME/jdk/jre
    for RDBMS: ./runInstaller -jreLoc $ORACLE_HOME/jdk/jre
  7. After that, you have to repair the JRE because the 10.2.0.4 patchset has overwritten the patched JRE with the defective versions. (7448301)
    % cd $ORACLE_HOME/jre
    % rm -rf 1.4.2
    % tar –xvf $ORACLE_HOME/jre/1.4.2-5390722.tar

Sources:

  • Note: 404248.1 – How To Install Oracle CRS And RAC Software On Itanium Servers With Montecito Processors
  • Note: 400227.1 – How To Install Oracle RDBMS Software On Itanium Servers With Montecito Processors
  • Bug 7448301 – Linux Itanium: 10.2.0.4 Patchset for Linux Itanium (Montecito) has wrong Java runtime