Listener Coredumps on heavy load systemBy Martin | November 10th, 2008 | Category: 10g, Linux Itanium, Oracle Database, Unix | 2 comments
Recently I have come across a system which experiences listener crashes (core dumps) every couple of days. The listener logfile shows errors shortly before core-dumping:
29-SEP-2008 03:49:07 * (CONNECT_DATA=(SID=MDDB1)(CID=(PROGRAM=)(HOST=__jdbc__)(USER=))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.0.1)(PORT
=1398)) * establish * MDDB1 * 12518
TNS-12518: TNS:listener could not hand off client connection
TNS-12571: TNS:packet writer failure
TNS-12560: TNS:protocol adapter error
TNS-00530: Protocol adapter error
Linux IA64 Error: 104: Connection reset by peer
After analysing the core dump with gdb, the stack points to malloc() calls, which mean that the listener requests memory from the OS.
gdb /oracle/ora10/bin/tnslsnr core.311
GNU gdb Red Hat Linux (126.96.36.199-1.153.el4rh)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "ia64-redhat-linux-gnu"...(no debugging symbols found)
Using host libthread_db library "/lib/tls/libthread_db.so.1".
Reading symbols from shared object read from target memory...(no debugging symbols found)...done.
Loaded system supplied DSO at 0xa000000000000000
Core was generated by `/oracle/ora10/bin/tnslsnr listenerv -inherit'.
Program terminated with signal 11, Segmentation fault.
#0 0x20000000027ee220 in malloc_consolidate ()
#0 0x20000000027ee220 in malloc_consolidate () from /lib/tls/libc.so.6.1
#1 0x20000000027f0e30 in _int_malloc () from /lib/tls/libc.so.6.1
#2 0x20000000027f4b50 in malloc () from /lib/tls/libc.so.6.1
#3 0x40000000000079f0 in nsglconcrt ()
#4 0x4000000000011a00 in nsglhc ()
#5 0x4000000000019690 in nsglhe ()
#6 0x400000000001b980 in nsglma ()
#7 0x4000000000007770 in main ()
After contacting Oracle Support with this stack, they confirmed it to be Bug #6752308 which was closed as Duplicate of Bug 6139856. There is patch for 10.2.0.3 available and they also recommend to implement hugepages. By the way, there is an interesting article on the effect of utilizing – or not utilizing – hugepages here.
6139856 - TT11.1VALGRIND: FMR (FREE MEMORY READ/WRITE) IN NSEV.C