Unix

Hugepages revisited

A while ago I wrote a post about a specific listener coredump issue which could be solved by 1) installing an oracle patch and 2) by implementing hugepages. I have also linked to an article from Pythian Group Oracle expert Riyaj Shamsudeen, who demonstrated problems with memory page management overheads with big SGAs without hugepages.

Today, I want to document the steps necessary to implement hugepages after doing some research:

  • Check your /etc/sysctl.conf shmall and shmmax values. I recommend that you set shmmax bigger or
  • Check your current total shared memory segment size. Depending on your “bc -l” skills by summing all byte lines from ipcs -m or by executing a script from metalink Note 401749.1 (which does exactly that). Calculate how many hugepages you need by “cat /proc/meminfo” and dividing it by the pagesize of your platform. (Linux IA64 has 256MB pages for example) I recommend to add 1 extra page for safety.
  • Say, you need 200 hugepages. Multiply it with the pagesize and enter this value in
    /etc/security/limits.conf: (values in kb)

oracle soft memlock 2097152
oracle hard memlock 2097152

  • Set the parameters in /etc/sysctl.conf:

vm.nr_hugepages=200
vm.hugetlb_shm_group=<group id of dba group from /etc/group>
e.g. vm.hugetlb_shm_group=201

I am going to implement hugepages on Linux Itanium for a Real Application Cluster system. I have read posts that there are different issues regarding startup with srvctl or sqlplus and startup by oracle or root. I will investigate and write more soon.



Listener Coredumps on heavy load system

Recently I have come across a system which experiences listener crashes (core dumps) every couple of days. The listener logfile shows errors shortly before core-dumping:

29-SEP-2008 03:49:07 * (CONNECT_DATA=(SID=MDDB1)(CID=(PROGRAM=)(HOST=__jdbc__)(USER=))) * (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.0.1)(PORT
=1398)) * establish * MDDB1 * 12518
TNS-12518: TNS:listener could not hand off client connection
TNS-12571: TNS:packet writer failure
TNS-12560: TNS:protocol adapter error
TNS-00530: Protocol adapter error
Linux IA64 Error: 104: Connection reset by peer

After analysing the core dump with gdb, the stack points to malloc() calls, which mean that the listener requests memory from the OS.

gdb /oracle/ora10/bin/tnslsnr core.311
GNU gdb Red Hat Linux (6.3.0.0-1.153.el4rh)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "ia64-redhat-linux-gnu"...(no debugging symbols found)
Using host libthread_db library "/lib/tls/libthread_db.so.1".

Reading symbols from shared object read from target memory...(no debugging symbols found)...done.
Loaded system supplied DSO at 0xa000000000000000
Core was generated by `/oracle/ora10/bin/tnslsnr listenerv -inherit'.
Program terminated with signal 11, Segmentation fault.

#0 0x20000000027ee220 in malloc_consolidate ()
from /lib/tls/libc.so.6.1
(gdb) bt
#0 0x20000000027ee220 in malloc_consolidate () from /lib/tls/libc.so.6.1
#1 0x20000000027f0e30 in _int_malloc () from /lib/tls/libc.so.6.1
#2 0x20000000027f4b50 in malloc () from /lib/tls/libc.so.6.1
#3 0x40000000000079f0 in nsglconcrt ()
#4 0x4000000000011a00 in nsglhc ()
#5 0x4000000000019690 in nsglhe ()
#6 0x400000000001b980 in nsglma ()
#7 0x4000000000007770 in main ()
(gdb) quit

After contacting Oracle Support with this stack, they confirmed it to be Bug #6752308 which was closed as Duplicate of Bug 6139856. There is patch for 10.2.0.3 available and they also recommend to implement hugepages. By the way, there is an interesting article on the effect of utilizing – or not utilizing – hugepages here.

6139856 - TT11.1VALGRIND: FMR (FREE MEMORY READ/WRITE) IN NSEV.C



Optimal VxFS Settings for Oracle Filesystems?

I found a funny presentation about Oracle filesystem stuff. It explains inode locking: Oracle Filesystems

A friend has pointed me to a very informative document from HP about the optimal settings of vxfs (HP OnlineJFS) filesystem-related parameters for Oracle: HP-UX JFS mount options for Oracle Database environments



Filesystem IO Monitoring with HP-UX Glance

On HP-UX, you can use glance to collect a huge range of OS Statistics and write them in a defined time interval into a file for later analysis. A collection of available metrics is available on the system in /opt/perf/paperdocs/gp/C/metrics.txt or /opt/perf/paperdocs/gp/C/metrics.pdf.

This is an example of IO Monitoring for Filesytems:

glance_filesystem.sh:

nohup glance -aos ./filesystem_advisor.conf -j 60 > glance_output_filesystem_$$.txt 2>/dev/null &

filesystem_advisor.conf:

headersprinted=headersprinted
if headersprinted != 1 then {
print "DATE       TIME     FILESYSTEM                   FIR     LIR     LRBR     LRR     LWBR      LWR     PIR     PRBR     PRR     PWBR     PWR"
headersprinted = 1
}
filesystem loop
{
print GBL_STATDATE|12," ",
GBL_STATTIME|9," ",
FS_DIRNAME|24,
FS_FILE_IO_RATE|8,
FS_LOGL_IO_RATE|8,
FS_LOGL_READ_BYTE_RATE|9,
FS_LOGL_READ_RATE|8,
FS_LOGL_WRITE_BYTE_RATE|9,
FS_LOGL_WRITE_RATE|8,
FS_PHYS_IO_RATE|8,
FS_PHYS_READ_BYTE_RATE|9,
FS_PHYS_READ_RATE|8,
FS_PHYS_WRITE_BYTE_RATE|9,
FS_PHYS_WRITE_RATE|8
}

Used Metrics:

FS_FILE_IO_RATE (FIR)

The number of file system related physical IOs per second directed to this file system during the interval. This value is similar to the values returned by the vmstat -d command except that vmstat reports all IOs and does not break them out by file system. Also, vmstat reports IOs from the kernel’s view, which may get broken down by the disk driver into multiple physical IOs. Since this metric reports values from the disk driver’s point of view, it is more accurate than vmstat.

FS_LOGL_IO_RATE (LIR)

The number of logical IOs per second directed to this file system during the interval. Logical IOs are generated by calling the read() or write() system calls.

FS_LOGL_READ_BYTE_RATE (LRBR)

The number of logical read KBs per second from this file system during the interval.

FS_LOGL_READ_RATE (LRR)

The number of logical reads per second directed to this file system during the interval. Logical reads are generated by calling the read() system call.

FS_LOGL_WRITE_BYTE_RATE (LWBR)

The number of logical writes KBs per second to this file system during the interval.

FS_LOGL_WRITE_RATE (LWR)

The number of logical writes per second directed to this file system during the interval. Logical writes are generated by calling the write() system call.

FS_PHYS_IO_RATE (PIR)

The number of physical IOs per second directed to this file system during the interval.

FS_PHYS_READ_BYTE_RATE (PRBR)

The number of physical KBs per second read from this file system during the interval.

FS_PHYS_READ_RATE (PRR)

The number of physical reads per second directed to this file system during the interval. On Unix systems, physical reads are generated by user file access, virtual memory access (paging), file system management, or raw device access.

FS_PHYS_WRITE_BYTE_RATE (PWBR)

The number of physical KBs per second written to this file system during the interval.

FS_PHYS_WRITE_RATE (PWR)

The number of physical writes per second directed to this file system during the interval.