Out-of-Memory killer on 32bit Linux with big RAM
By Martin | August 25th, 2009 | Category: 10g, 11g, 9iR2, Linux, Oracle Database | 1 Comment »It is not very known that you can run into serious problems if you run Linux x86-32bit with a big amount of RAM installed, if using RHEL below 5. The official name for the issue is called “Low Memory Starvation”. The best solution is to use x86-64bit to be able to address the whole amount of RAM efficiently.
However, if that is not feasible, then make sure that you at least run the hugemem kernel if you use RHEL < 5. In RHEL5-32bit, the hugemem kernel is default. A quick demonstration about what can happen if you don´t use hugemem kernel is shown here: We realized that RMAN backup is taking more than 24 hours. Querying v$session, we find out that one session is in ACTION "STARTED", whereas the other sessions are FINISHED.
SQL> select program, module,action from v$session where username = 'SYS' and program like 'rman%' / PROGRAM MODULE ACTION -------------------------- --------------------------- ------------------- rman@ora-vm1 (TNS V1-V3) backup full datafile 0000078 FINISHED129 rman@ora-vm1 (TNS V1-V3) backup full datafile 0000272 STARTED16 rman@ora-vm1 (TNS V1-V3) backup full datafile 0000084 FINISHED129 rman@ora-vm1 (TNS V1-V3) rman@ora-vm1 (TNS V1-V3) rman@ora-vm1 (TNS V1-V3) rman@ora-vm1 (TNS V1-V3) 0000004 FINISHED131 rman@ora-vm1 (TNS V1-V3) backup full datafile 0000092 FINISHED129
Then we check the SID,serial# from v$session of this session and also query the UNIX PID from v$process.spid
SQL> select sid,serial# from v$session where event like 'RMAN%'; SID SERIAL# ---------- ---------- 4343 5837
We activate SQL Tracing for this session to determine its activity:
SQL> select spid from v$process where addr = (select paddr from v$session where sid = 4343); SPID ------------ 1681 SQL> begin dbms_monitor.session_trace_enable(4343,5837,true,true); 2 end; 3 /
However, no trace file gets created. Then we start tracing system calls with strace:
ora-vm1:# strace -fp 1681 attach: ptrace(PTRACE_ATTACH, ...): Operation not permitted
“Not permitted”? – Although I am connected as root?
ps -ef|grep 1681 oracle 1681 1582 0 Aug24 ? 00:00:09 [oracle] <defunct>
The linux command “ps” reports the server process as “defunct”.
ora-vm1:/usr/oracle/admin/labo/udump$ ps -ef|grep 1582 oracle 1582 21578 0 Aug24 ? 00:00:02 rman oracle/product/10.2.0/bin/rman nocatalog oracle 21663 1582 0 Aug24 ? 00:00:01 oraclelabo (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq))) oracle 21665 1582 0 Aug24 ? 00:00:03 oraclelabo (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq))) oracle 1681 1582 0 Aug24 ? 00:00:09 [oracle] <defunct> oracle 21691 1582 0 Aug24 ? 00:01:36 oraclelabo (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq))) oracle 21695 1582 0 Aug24 ? 00:01:41 oraclelabo (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq))) oracle 21793 1582 0 Aug24 ? 00:01:30 oraclelabo (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
Next, I checked logfile /var/log/messages.1 and realized that the kernel out-of-memory killer (OOM) killed this PID because of low memory starvation.
/var/log/messages.1: Aug 24 22:32:44 ora-vm1 kernel: Out of Memory: Killed process 1681 (oracle).