Is your RAC 10.2.0.4 a ticking bomb?

I have come a across a very nasty bug on Oracle RAC on HP-UX Itanium after upgrading to 10.2.0.4. The problem might also occur on Solaris and Linux x86-64. On one of the RAC instances of a 2 node Cluster, the number of open file descriptors of the oracle racgimon process is increasing by 1 every 60 seconds. This means that if your ulimit of open files for a process is set high and the HP-UX Kernel parameter nfiles is also set high, it might take weeks to months until the racgimon process finally hits the limit. If that happens, it can cause instability of the node because no more filedescriptors can be opened system-wide.

How to check, whether my installation suffers from this bug?

It is very easy: do an lsof -p and look for dozens of open filedescriptors of file hc_SID.dat.

– or –

Check logfile $ORACLE_HOME/log//racg/imon_.log:

2008-10-07 13:05:50.879: [ RACG][82] [29827][82][ora.DBNAME.DBNAME1.inst]: GIMH: GIM-00104: Health
check failed to connect to instance.
GIM-00090: OS-dependent operation:mmap failed with status: 12
GIM-00091: OS failure message: Not enough space
GIM-00092: OS failure occurred at: sskgmsmr_13

Is there a patch?
Good news is, there is. The bug is tracked via BugID 6931689 and there is patch #7298531 available to fix the problem with metalink note 739557.1.

Leave Comment