Linux

Out-of-Memory killer on 32bit Linux with big RAM

It is not very known that you can run into serious problems if you run Linux x86-32bit with a big amount of RAM installed, if using RHEL below 5. The official name for the issue is called “Low Memory Starvation”. The best solution is to use x86-64bit to be able to address the whole amount of RAM efficiently.

However, if that is not feasible, then make sure that you at least run the hugemem kernel if you use RHEL < 5. In RHEL5-32bit, the hugemem kernel is default. A quick demonstration about what can happen if you don´t use hugemem kernel is shown here: We realized that RMAN backup is taking more than 24 hours. Querying v$session, we find out that one session is in ACTION "STARTED", whereas the other sessions are FINISHED.

SQL> select program, module,action 
      from v$session 
      where username = 'SYS' and program like 'rman%'
/      

PROGRAM                    MODULE                       ACTION             
-------------------------- ---------------------------  -------------------
rman@ora-vm1 (TNS V1-V3)    backup full datafile        0000078 FINISHED129
rman@ora-vm1 (TNS V1-V3)    backup full datafile        0000272 STARTED16  
rman@ora-vm1 (TNS V1-V3)    backup full datafile        0000084 FINISHED129
rman@ora-vm1 (TNS V1-V3)    rman@ora-vm1 (TNS V1-V3)                       
rman@ora-vm1 (TNS V1-V3)    rman@ora-vm1 (TNS V1-V3)    0000004 FINISHED131
rman@ora-vm1 (TNS V1-V3)    backup full datafile        0000092 FINISHED129

Then we check the SID,serial# from v$session of this session and also query the UNIX PID from v$process.spid

SQL> select sid,serial# from v$session where event like 'RMAN%';

       SID    SERIAL#
---------- ----------
      4343       5837

We activate SQL Tracing for this session to determine its activity:

SQL> select spid from v$process where addr = 
   (select paddr from v$session where sid = 4343);

SPID
------------
1681

SQL> begin dbms_monitor.session_trace_enable(4343,5837,true,true);
  2  end;
  3  /

However, no trace file gets created. Then we start tracing system calls with strace:

ora-vm1:# strace -fp 1681
attach: ptrace(PTRACE_ATTACH, ...): Operation not permitted

“Not permitted”? – Although I am connected as root?

ps -ef|grep 1681
oracle    1681 1582  0 Aug24 ?        00:00:09 [oracle] <defunct>

The linux command “ps” reports the server process as “defunct”.

ora-vm1:/usr/oracle/admin/labo/udump$ ps -ef|grep 1582
oracle   1582 21578  0 Aug24 ?        00:00:02 rman oracle/product/10.2.0/bin/rman nocatalog
oracle   21663 1582  0 Aug24 ?        00:00:01 oraclelabo (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oracle   21665 1582  0 Aug24 ?        00:00:03 oraclelabo (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oracle   1681 1582   0 Aug24 ?        00:00:09 [oracle] <defunct>
oracle   21691 1582  0 Aug24 ?        00:01:36 oraclelabo (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oracle   21695 1582  0 Aug24 ?        00:01:41 oraclelabo (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oracle   21793 1582  0 Aug24 ?        00:01:30 oraclelabo (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))

Next, I checked logfile /var/log/messages.1 and realized that the kernel out-of-memory killer (OOM) killed this PID because of low memory starvation.

/var/log/messages.1:
Aug  24 22:32:44 ora-vm1 kernel: Out of Memory: Killed process 1681 (oracle).


Multipathing Configuration issue waiting to happen

Quite some time ago, I came across a quite hard to find issue during a consulting engagement, which i find worth mentioning. A 2 node RAC cluster running on RHEL4 x86-64 was relocated to a different data center. Apart from making sure, that the switch ports and Fibre Channel Ports are available on the new location, there is not much to worry about.

After the relocation, on one node the multipathing configuration, implemented with dev-mapper-multipath would not work. The command “multipath -ll” would just not return any output. After more than an hour, we pinned the issue down to the error message:

# multipath -v 3
#
# all paths in cache :
#



path sdh not found in pathvec

When checking what device sdh was, we realized that this was a KVM device, plugged in by the sysadmins.

May 27 14:46:26 host1 kernel: Attached scsi removable disk sdh at scsi10, channel 0, id 0, lun 0
May 27 14:46:26 host1 kernel: Type: Direct-Access ANSI SCSI revision: 02
May 27 14:46:26 host1 kernel: Vendor: KVM Model: vmDisk Rev: 0.01
May 27 14:46:26 host1 kernel: scsi10 : SCSI emulation for USB Mass Storage devices
May 27 14:46:26 host1 kernel: sr1: scsi3-mmc drive: 0x/0x caddy
May 27 14:46:26 host1 kernel: Type: CD-ROM ANSI SCSI revision: 02
May 27 14:46:26 host1 kernel: Vendor: KVM Model: vmDisk-CD Rev: 0.01

BTW: What is a KVM device?
Wikipedia states: A KVM switch (with KVM being an abbreviation for Keyboard, Video or Visual Display Unit, Mouse) is a hardware device that allows a user to control multiple computers from a single keyboard, video monitor and mouse.

We then added the device sdh to the multipath blacklist section in /etc/multipath.conf, and the problem was solved:

devnode_blacklist {
devnode “^sdh$”
}



Your experience with RAC Dynamic Remastering (DRM) in 10gR2?

One of my customers is having severe RAC performance issues, which appeared a dozen times so far. Each time, the performance impact lasted around 10 minutes and caused basically a hang of the application. ASH investigation revealed that the time frame of performance issues exactly matches a DRM operation of the biggest segment of the database. During the problematic time period, there are 20-50 instead of 5-10 active sessions and they are mostly waiting for gc related events: “gc buffer busy”,”gc cr block busy”, “gc cr block 2-way”, “gc current block 2-way”, “gc current request”, “gc current grant busy”, etc.

In addition, there is one single session which has wait event “kjbdrmcvtq lmon drm quiesce: ping completion” (on instance 1) and 1-3 sessions with wait event “gc remaster“. (on instance 2) The cur_obj# of the session waiting on “gc remaster” is pointing to the segment being remastered.

Does anybody have any experience with DRM problems with 10.2.0.4 on Linux Itanium?

I know that it is possible to deactive DRM, but usually it should be beneficial to have it enabled. I could not find any reports of performance impact during DRM operation on metalink. Support is involved but clueless so far.

Regards,
Martin

http://forums.oracle.com/forums/message.jspa?messageID=3447436#3447436



Session waiting for “enq: RO – fast object reuse” – DBWR Process spinning on CPU

I have encountered the following problem on a 10.2.0.4 database on Linux x86_64 today:
A user session has been waiting for “enq: RO – fast object reuse” for almost 60 minutes while executing a “truncate table” SQL statement.

SQL> select username, event, sql_id, taddr, last_call_et from v$session where sid = 234;

USERNAME EVENT SQL_ID TADDR LAST_CALL_ET
———- —————————– ————- —————- ————
MD enq: RO – fast object reuse ljk299jlkj003 0000000153264570 3542

SQL> select sql_text from v$sqlstats where sql_id = ‘ljk299jlkj003’;

SQL_TEXT
————————————-
truncate table tab1

The Session was blocked by the CKPT process:

SQL> select * from dba_waiters;

WAITING_SESSION HOLDING_SESSION LOCK_TYPE MODE_HELD MODE_REQUESTED LOCK_ID1 LOCK_ID2
————— ————— ————————– —————————————- —————————————- ———- ———-
234 423 RO Row-S (SS) Exclusive 65573 1

SQL> select sid, serial#, sql_id, last_call_et, machine, program, username from v$session where sid = 423;

SID SERIAL# SQL_ID LAST_CALL_ET MACHINE PROGRAM
———- ———- ————- ———— —————- ——————————–
423 1 4133636 ora-vm1.intra oracle@ora-vm1.intra (CKPT)

The checkpoint process was waiting for database writer DBWR process, which was spinning on one cpu:

top

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
10712 oracle 25 0 2201m 1.7g 1.7g R 99.5 21.7 108:18.03 oracle

PID 10712 maps to DBW0:

[oracle@ora-vm1 ]$ ps -ef|grep 10712
oracle 10712 1 0 2008 ? 03:23:05 ora_dbw0_MDDB01

mpstat

Linux 2.6.9-78.ELsmp (ora-vm1.intra) 01/20/2009

02:21:56 PM CPU %user %nice %system %iowait %irq %soft %idle intr/s
02:21:57 PM all 49.75 0.00 0.00 0.00 0.00 0.00 50.25 1055.00
02:21:57 PM 0 0.00 0.00 0.00 0.00 0.00 0.00 100.00 1006.00
02:21:57 PM 1 100.00 0.00 0.00 0.00 0.00 0.00 0.00 49.00

02:21:57 PM CPU %user %nice %system %iowait %irq %soft %idle intr/s
02:21:58 PM all 50.75 0.00 0.00 0.50 0.00 0.00 48.76 1161.00
02:21:58 PM 0 1.00 0.00 0.00 1.00 0.00 0.00 98.00 1087.00
02:21:58 PM 1 100.00 0.00 0.00 0.00 0.00 0.00 0.00 74.00

The stack of dbw0 during the time showed these signatures:

[oracle@ora-vm1 oracle]$ pstack 10712
#0 0x000000000074b7fb in kslfre ()
#1 0x00000000010ccc3b in kcbo_exam_buf ()
#2 0x00000000010d0d62 in kcbo_service_ockpt ()
#3 0x0000000001080cd7 in kcbbdrv ()
#4 0x00000000007ddcc2 in ksbabs ()
#5 0x00000000007e4b32 in ksbrdp ()
#6 0x0000000002efcb50 in opirip ()
#7 0x00000000012da326 in opidrv ()
#8 0x0000000001e62456 in sou2o ()
#9 0x00000000006d2555 in opimai_real ()
#10 0x00000000006d240c in main ()
[oracle@ora-vm1 oracle]$ pstack 10712
#0 0x000000000074b36d in kslfre ()
#1 0x00000000010cc203 in kcbo_write_process ()
#2 0x00000000010ce608 in kcbo_write_q ()
#3 0x0000000001080a6d in kcbbdrv ()
#4 0x00000000007ddcc2 in ksbabs ()
#5 0x00000000007e4b32 in ksbrdp ()
#6 0x0000000002efcb50 in opirip ()
#7 0x00000000012da326 in opidrv ()
#8 0x0000000001e62456 in sou2o ()
#9 0x00000000006d2555 in opimai_real ()
#10 0x00000000006d240c in main ()
[oracle@ora-vm1 oracle]$ pstack 10712
#0 0x00000000010ccb60 in kcbo_exam_buf ()
#1 0x00000000010d0d62 in kcbo_service_ockpt ()
#2 0x0000000001080cd7 in kcbbdrv ()
#3 0x00000000007ddcc2 in ksbabs ()
#4 0x00000000007e4b32 in ksbrdp ()
#5 0x0000000002efcb50 in opirip ()
#6 0x00000000012da326 in opidrv ()
#7 0x0000000001e62456 in sou2o ()
#8 0x00000000006d2555 in opimai_real ()
#9 0x00000000006d240c in main ()
[oracle@ora-vm1 oracle]$ pstack 10712
#0 0x00000000010d0da5 in kcbo_service_ockpt ()
#1 0x0000000001080cd7 in kcbbdrv ()
#2 0x00000000007ddcc2 in ksbabs ()
#3 0x00000000007e4b32 in ksbrdp ()
#4 0x0000000002efcb50 in opirip ()
#5 0x00000000012da326 in opidrv ()
#6 0x0000000001e62456 in sou2o ()
#7 0x00000000006d2555 in opimai_real ()
#8 0x00000000006d240c in main ()

A MetaLink Research for the term “kcbo_service_ockpt” leads to Bug 7376934, which is a duplicate of Bug 7385253 – DBWR IS CONSUMING HIGH CPU.

Patch 7385253 is available for Linux x86_64, HP-UX, Solaris, AIX.
Reference:
MetaLink Note 762085.1 – Subject: ‘enq: RO – fast object reuse’ contention when gathering schema/table statistics in parallel



Hugepages revisited II: Be aware of kernel bugs!

It is well known that hugepages can reduce the overhead of managing memory pages of Oracle SGA by the operating system thus leading to lower system cpu utilization. I have written two blog entries regarding this topic already: Listener Coredumps on heavy load system and Hugepages revisited.

However, there is a potential risk with it: Certain kernels / platforms have bugs regarding hugepages which can lead to problems:

  • Bug 131295 – Hugepages configured on kernel boot line causes x86_64 kernel boot to fail with OOM: Fixed in RHEL3: kernel-2.4.21-40.EL
  • Bug 248954 – Oracle ASM DBWR process goes into 100% CPU spin when using hugepages on ia64 (Fixed in kernel-2.6.9-78.EL.ia64.rpm available as update for RHEL4U7)
  • RHSA-2008:1017-14: on the Itanium® architecture, setting the “vm.nr_hugepages” sysctl parameter caused a kernel stack overflow resulting in a kernel panic, and possibly stack corruption. With this fix, setting vm.nr_hugepages works correctly. Fixed with RHEL5 kernel-2.6.18-92.1.22.el5.ia64.rpm
  • RHSA-2008:1017-14: hugepages allow the Linux kernel to utilize the multiple page size capabilities of modern hardware architectures. In certain configurations, systems with large amounts of memory could fail to allocate most of this memory for hugepages even if it was free. This could result, for example, in database restart failures. Fixed with RHEL5 kernel-2.6.18-92.1.22.el5.ia64.rpm

Therefore, before enabling hugepages, I recommend to check with your OS Vendor Bug Database, test on a test system and apply recent OS upgrades first.



NUMA enabled in 10.2.0.4

When upgrading from pre 10.2.0.4 to 10.2.0.4, Oracle enables NUMA support. This has the effect that there can be multiple shared memory segments (MetaLink Note: 429872.1) although shmmax/shmall are set to high values.

I have read MetaLink Notes (7171446.8, 6730567.8, 6689903.8) and this blog entry, where a customer had problems on HP-UX with the default NUMA settings.

Better than that, it can also lead to instance crashes in 10.2.0.4 as reported in MetaLink Note 743191.1. Good news is that there is a patch available for Linux x86_64/10.2.0.4.

I have asked Oracle Support whether it is safe to leave NUMA enabled for Linux Itanium, but they would not comment on it. Instead they asked me to check with the OS vendor. Great. ;-(



Hugepages revisited

A while ago I wrote a post about a specific listener coredump issue which could be solved by 1) installing an oracle patch and 2) by implementing hugepages. I have also linked to an article from Pythian Group Oracle expert Riyaj Shamsudeen, who demonstrated problems with memory page management overheads with big SGAs without hugepages.

Today, I want to document the steps necessary to implement hugepages after doing some research:

  • Check your /etc/sysctl.conf shmall and shmmax values. I recommend that you set shmmax bigger or
  • Check your current total shared memory segment size. Depending on your “bc -l” skills by summing all byte lines from ipcs -m or by executing a script from metalink Note 401749.1 (which does exactly that). Calculate how many hugepages you need by “cat /proc/meminfo” and dividing it by the pagesize of your platform. (Linux IA64 has 256MB pages for example) I recommend to add 1 extra page for safety.
  • Say, you need 200 hugepages. Multiply it with the pagesize and enter this value in
    /etc/security/limits.conf: (values in kb)

oracle soft memlock 2097152
oracle hard memlock 2097152

  • Set the parameters in /etc/sysctl.conf:

vm.nr_hugepages=200
vm.hugetlb_shm_group=<group id of dba group from /etc/group>
e.g. vm.hugetlb_shm_group=201

I am going to implement hugepages on Linux Itanium for a Real Application Cluster system. I have read posts that there are different issues regarding startup with srvctl or sqlplus and startup by oracle or root. I will investigate and write more soon.