Out-of-Memory killer on 32bit Linux with big RAM

It is not very known that you can run into serious problems if you run Linux x86-32bit with a big amount of RAM installed, if using RHEL below 5. The official name for the issue is called “Low Memory Starvation”. The best solution is to use x86-64bit to be able to address the whole amount of RAM efficiently.

However, if that is not feasible, then make sure that you at least run the hugemem kernel if you use RHEL < 5. In RHEL5-32bit, the hugemem kernel is default. A quick demonstration about what can happen if you don´t use hugemem kernel is shown here: We realized that RMAN backup is taking more than 24 hours. Querying v$session, we find out that one session is in ACTION "STARTED", whereas the other sessions are FINISHED.

SQL> select program, module,action 
      from v$session 
      where username = 'SYS' and program like 'rman%'
/      

PROGRAM                    MODULE                       ACTION             
-------------------------- ---------------------------  -------------------
rman@ora-vm1 (TNS V1-V3)    backup full datafile        0000078 FINISHED129
rman@ora-vm1 (TNS V1-V3)    backup full datafile        0000272 STARTED16  
rman@ora-vm1 (TNS V1-V3)    backup full datafile        0000084 FINISHED129
rman@ora-vm1 (TNS V1-V3)    rman@ora-vm1 (TNS V1-V3)                       
rman@ora-vm1 (TNS V1-V3)    rman@ora-vm1 (TNS V1-V3)    0000004 FINISHED131
rman@ora-vm1 (TNS V1-V3)    backup full datafile        0000092 FINISHED129

Then we check the SID,serial# from v$session of this session and also query the UNIX PID from v$process.spid

SQL> select sid,serial# from v$session where event like 'RMAN%';

       SID    SERIAL#
---------- ----------
      4343       5837

We activate SQL Tracing for this session to determine its activity:

SQL> select spid from v$process where addr = 
   (select paddr from v$session where sid = 4343);

SPID
------------
1681

SQL> begin dbms_monitor.session_trace_enable(4343,5837,true,true);
  2  end;
  3  /

However, no trace file gets created. Then we start tracing system calls with strace:

ora-vm1:# strace -fp 1681
attach: ptrace(PTRACE_ATTACH, ...): Operation not permitted

“Not permitted”? – Although I am connected as root?

ps -ef|grep 1681
oracle    1681 1582  0 Aug24 ?        00:00:09 [oracle] <defunct>

The linux command “ps” reports the server process as “defunct”.

ora-vm1:/usr/oracle/admin/labo/udump$ ps -ef|grep 1582
oracle   1582 21578  0 Aug24 ?        00:00:02 rman oracle/product/10.2.0/bin/rman nocatalog
oracle   21663 1582  0 Aug24 ?        00:00:01 oraclelabo (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oracle   21665 1582  0 Aug24 ?        00:00:03 oraclelabo (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oracle   1681 1582   0 Aug24 ?        00:00:09 [oracle] <defunct>
oracle   21691 1582  0 Aug24 ?        00:01:36 oraclelabo (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oracle   21695 1582  0 Aug24 ?        00:01:41 oraclelabo (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oracle   21793 1582  0 Aug24 ?        00:01:30 oraclelabo (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))

Next, I checked logfile /var/log/messages.1 and realized that the kernel out-of-memory killer (OOM) killed this PID because of low memory starvation.

/var/log/messages.1:
Aug  24 22:32:44 ora-vm1 kernel: Out of Memory: Killed process 1681 (oracle).

One comment
Leave a comment »

  1. […] Martin Decker -Out-of-Memory killer on 32bit Linux with big RAM […]

Leave Comment