Archive for November 2008

Hugepages revisited

A while ago I wrote a post about a specific listener coredump issue which could be solved by 1) installing an oracle patch and 2) by implementing hugepages. I have also linked to an article from Pythian Group Oracle expert Riyaj Shamsudeen, who demonstrated problems with memory page management overheads with big SGAs without hugepages.

Today, I want to document the steps necessary to implement hugepages after doing some research:

  • Check your /etc/sysctl.conf shmall and shmmax values. I recommend that you set shmmax bigger or
  • Check your current total shared memory segment size. Depending on your “bc -l” skills by summing all byte lines from ipcs -m or by executing a script from metalink Note 401749.1 (which does exactly that). Calculate how many hugepages you need by “cat /proc/meminfo” and dividing it by the pagesize of your platform. (Linux IA64 has 256MB pages for example) I recommend to add 1 extra page for safety.
  • Say, you need 200 hugepages. Multiply it with the pagesize and enter this value in
    /etc/security/limits.conf: (values in kb)

oracle soft memlock 2097152
oracle hard memlock 2097152

  • Set the parameters in /etc/sysctl.conf:

vm.nr_hugepages=200
vm.hugetlb_shm_group=<group id of dba group from /etc/group>
e.g. vm.hugetlb_shm_group=201

I am going to implement hugepages on Linux Itanium for a Real Application Cluster system. I have read posts that there are different issues regarding startup with srvctl or sqlplus and startup by oracle or root. I will investigate and write more soon.



SQL Profiler TVD$XTAT update available

I just saw that Christian Antognini has released an update of TVD$XTAT. I am looking forward to experimenting with it. If you used tkprof in the past, you should definitely take a look at this tool.



Excellent Presentations on

The database specialist Riyaj Shamsudeen from The Pythian Group has published some excellent presentations on his blog. Don´t miss it!



Listener Coredumps on heavy load system

Recently I have come across a system which experiences listener crashes (core dumps) every couple of days. The listener logfile shows errors shortly before core-dumping:

29-SEP-2008 03:49:07 * (CONNECT_DATA=(SID=MDDB1)(CID=(PROGRAM=)(HOST=__jdbc__)(USER=))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.0.1)(PORT
=1398)) * establish * MDDB1 * 12518
TNS-12518: TNS:listener could not hand off client connection
TNS-12571: TNS:packet writer failure
TNS-12560: TNS:protocol adapter error
TNS-00530: Protocol adapter error
Linux IA64 Error: 104: Connection reset by peer

After analysing the core dump with gdb, the stack points to malloc() calls, which mean that the listener requests memory from the OS.

gdb /oracle/ora10/bin/tnslsnr core.311
GNU gdb Red Hat Linux (6.3.0.0-1.153.el4rh)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "ia64-redhat-linux-gnu"...(no debugging symbols found)
Using host libthread_db library "/lib/tls/libthread_db.so.1".

Reading symbols from shared object read from target memory...(no debugging symbols found)...done.
Loaded system supplied DSO at 0xa000000000000000
Core was generated by `/oracle/ora10/bin/tnslsnr listenerv -inherit'.
Program terminated with signal 11, Segmentation fault.

#0 0x20000000027ee220 in malloc_consolidate ()
from /lib/tls/libc.so.6.1
(gdb) bt
#0 0x20000000027ee220 in malloc_consolidate () from /lib/tls/libc.so.6.1
#1 0x20000000027f0e30 in _int_malloc () from /lib/tls/libc.so.6.1
#2 0x20000000027f4b50 in malloc () from /lib/tls/libc.so.6.1
#3 0x40000000000079f0 in nsglconcrt ()
#4 0x4000000000011a00 in nsglhc ()
#5 0x4000000000019690 in nsglhe ()
#6 0x400000000001b980 in nsglma ()
#7 0x4000000000007770 in main ()
(gdb) quit

After contacting Oracle Support with this stack, they confirmed it to be Bug #6752308 which was closed as Duplicate of Bug 6139856. There is patch for 10.2.0.3 available and they also recommend to implement hugepages. By the way, there is an interesting article on the effect of utilizing – or not utilizing – hugepages here.

6139856 - TT11.1VALGRIND: FMR (FREE MEMORY READ/WRITE) IN NSEV.C