Archive for August 2009

Out-of-Memory killer on 32bit Linux with big RAM

It is not very known that you can run into serious problems if you run Linux x86-32bit with a big amount of RAM installed, if using RHEL below 5. The official name for the issue is called “Low Memory Starvation”. The best solution is to use x86-64bit to be able to address the whole amount of RAM efficiently.

However, if that is not feasible, then make sure that you at least run the hugemem kernel if you use RHEL < 5. In RHEL5-32bit, the hugemem kernel is default. A quick demonstration about what can happen if you don´t use hugemem kernel is shown here: We realized that RMAN backup is taking more than 24 hours. Querying v$session, we find out that one session is in ACTION "STARTED", whereas the other sessions are FINISHED.

SQL> select program, module,action 
      from v$session 
      where username = 'SYS' and program like 'rman%'
/      

PROGRAM                    MODULE                       ACTION             
-------------------------- ---------------------------  -------------------
rman@ora-vm1 (TNS V1-V3)    backup full datafile        0000078 FINISHED129
rman@ora-vm1 (TNS V1-V3)    backup full datafile        0000272 STARTED16  
rman@ora-vm1 (TNS V1-V3)    backup full datafile        0000084 FINISHED129
rman@ora-vm1 (TNS V1-V3)    rman@ora-vm1 (TNS V1-V3)                       
rman@ora-vm1 (TNS V1-V3)    rman@ora-vm1 (TNS V1-V3)    0000004 FINISHED131
rman@ora-vm1 (TNS V1-V3)    backup full datafile        0000092 FINISHED129

Then we check the SID,serial# from v$session of this session and also query the UNIX PID from v$process.spid

SQL> select sid,serial# from v$session where event like 'RMAN%';

       SID    SERIAL#
---------- ----------
      4343       5837

We activate SQL Tracing for this session to determine its activity:

SQL> select spid from v$process where addr = 
   (select paddr from v$session where sid = 4343);

SPID
------------
1681

SQL> begin dbms_monitor.session_trace_enable(4343,5837,true,true);
  2  end;
  3  /

However, no trace file gets created. Then we start tracing system calls with strace:

ora-vm1:# strace -fp 1681
attach: ptrace(PTRACE_ATTACH, ...): Operation not permitted

“Not permitted”? – Although I am connected as root?

ps -ef|grep 1681
oracle    1681 1582  0 Aug24 ?        00:00:09 [oracle] <defunct>

The linux command “ps” reports the server process as “defunct”.

ora-vm1:/usr/oracle/admin/labo/udump$ ps -ef|grep 1582
oracle   1582 21578  0 Aug24 ?        00:00:02 rman oracle/product/10.2.0/bin/rman nocatalog
oracle   21663 1582  0 Aug24 ?        00:00:01 oraclelabo (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oracle   21665 1582  0 Aug24 ?        00:00:03 oraclelabo (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oracle   1681 1582   0 Aug24 ?        00:00:09 [oracle] <defunct>
oracle   21691 1582  0 Aug24 ?        00:01:36 oraclelabo (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oracle   21695 1582  0 Aug24 ?        00:01:41 oraclelabo (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oracle   21793 1582  0 Aug24 ?        00:01:30 oraclelabo (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))

Next, I checked logfile /var/log/messages.1 and realized that the kernel out-of-memory killer (OOM) killed this PID because of low memory starvation.

/var/log/messages.1:
Aug  24 22:32:44 ora-vm1 kernel: Out of Memory: Killed process 1681 (oracle).


Patch Bundle available on top of Enterprise Manager Grid Control 10.2.0.5

In MetaLink Note 853691.1: “ALERT: Important Upgrade Steps Required for Enterprise Manager Grid Control 10gR5 (10.2.0.5) Upgrades”, Oracle Support recommends to install some patches after the upgrade of OEM to 10.2.0.5.

Especially, patch bundle #8708893 solves around 20 bugs which could interfere with normal operations of Grid Control.



High CPU Utilization – waits on cursor: pin S

I have recently encountered a problem at a customer site, where the database instances resource utilization was so high that the application did not work anymore. Version was 10.2.0.4 on Linux with Oracle Recommended Generic Patches installed.

Unfortunately, the customer decided to bounce the instance, so that there is no possibility for intensive diagnosis. However, ASH report shows that a dozen sessions either waited on Wait event “cursor: pin S” or were active (ON CPU) without any SQL_ID. An AWR Report showed:

Top 5 Timed Events                                         Avg %Total
~~~~~~~~~~~~~~~~~~                                        wait   Call
Event                                 Waits    Time (s)   (ms)   Time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
cursor: pin S                   140,036,615      24,833      0 ######      Other
CPU time                                             19          72.9
log file sync                         8,767           8      1   30.2     Commit
log file parallel write               9,039           8      1   29.9 System I/O
control file parallel write           1,269           5      4   20.2 System I/O

Oracle Support confirmed that this is Bug 6904068 High CPU usage when there are “cursor: pin S” waits. We have filed a backport request for it, as there is no patch for our platform available yet.



Multipathing Configuration issue waiting to happen

Quite some time ago, I came across a quite hard to find issue during a consulting engagement, which i find worth mentioning. A 2 node RAC cluster running on RHEL4 x86-64 was relocated to a different data center. Apart from making sure, that the switch ports and Fibre Channel Ports are available on the new location, there is not much to worry about.

After the relocation, on one node the multipathing configuration, implemented with dev-mapper-multipath would not work. The command “multipath -ll” would just not return any output. After more than an hour, we pinned the issue down to the error message:

# multipath -v 3
#
# all paths in cache :
#



path sdh not found in pathvec

When checking what device sdh was, we realized that this was a KVM device, plugged in by the sysadmins.

May 27 14:46:26 host1 kernel: Attached scsi removable disk sdh at scsi10, channel 0, id 0, lun 0
May 27 14:46:26 host1 kernel: Type: Direct-Access ANSI SCSI revision: 02
May 27 14:46:26 host1 kernel: Vendor: KVM Model: vmDisk Rev: 0.01
May 27 14:46:26 host1 kernel: scsi10 : SCSI emulation for USB Mass Storage devices
May 27 14:46:26 host1 kernel: sr1: scsi3-mmc drive: 0x/0x caddy
May 27 14:46:26 host1 kernel: Type: CD-ROM ANSI SCSI revision: 02
May 27 14:46:26 host1 kernel: Vendor: KVM Model: vmDisk-CD Rev: 0.01

BTW: What is a KVM device?
Wikipedia states: A KVM switch (with KVM being an abbreviation for Keyboard, Video or Visual Display Unit, Mouse) is a hardware device that allows a user to control multiple computers from a single keyboard, video monitor and mouse.

We then added the device sdh to the multipath blacklist section in /etc/multipath.conf, and the problem was solved:

devnode_blacklist {
devnode “^sdh$”
}



Is your database secure enough? Check out Metasploit …

I have come across a short post on Pete Finnigan´s Oracle Security Weblog, who informed about the release of new Metasploit modules usable for penetration testing of Oracle databases.

What is Metasploit?

Metasploit is a framework, which enables automatic utilization of all kinds of exploits to test security of a system. Among others, there is an Oracle module.

To get some idea about what is possible, watch this: Attacking Oracle with the Metasploit Framework Shmoocon Firetalk Demo Video. In a very impressive 5 minute video, the presenter demonstrates how to use Oracle Listener version identification, SID brute force, well known username/password combinations (e.g. scott/tiger), gets access to scott, privilege escalates to dba, plants a java class to exec os commands, etc… You get the idea….

This will be something to watch out for, because it will enable script-kiddies to attack badly secured databases connected to the internet, or well trained rogue internal employees to attack databases, which do not have critical patch updates for well known security vulnerabilities installed.

A reuters report about this new release can you find here.

Update 2009-08-13: The metasploit developer has uploaded new demo videos of how to hack an oracle database with metasploit.