Hugepages revisited II: Be aware of kernel bugs!

It is well known that hugepages can reduce the overhead of managing memory pages of Oracle SGA by the operating system thus leading to lower system cpu utilization. I have written two blog entries regarding this topic already: Listener Coredumps on heavy load system and Hugepages revisited.

However, there is a potential risk with it: Certain kernels / platforms have bugs regarding hugepages which can lead to problems:

  • BugĀ 131295 – Hugepages configured on kernel boot line causes x86_64 kernel boot to fail with OOM: Fixed in RHEL3: kernel-2.4.21-40.EL
  • Bug 248954 – Oracle ASM DBWR process goes into 100% CPU spin when using hugepages on ia64 (Fixed in kernel-2.6.9-78.EL.ia64.rpm available as update for RHEL4U7)
  • RHSA-2008:1017-14: on the ItaniumĀ® architecture, setting the “vm.nr_hugepages” sysctl parameter caused a kernel stack overflow resulting in a kernel panic, and possibly stack corruption. With this fix, setting vm.nr_hugepages works correctly. Fixed with RHEL5 kernel-2.6.18-92.1.22.el5.ia64.rpm
  • RHSA-2008:1017-14: hugepages allow the Linux kernel to utilize the multiple page size capabilities of modern hardware architectures. In certain configurations, systems with large amounts of memory could fail to allocate most of this memory for hugepages even if it was free. This could result, for example, in database restart failures. Fixed with RHEL5 kernel-2.6.18-92.1.22.el5.ia64.rpm

Therefore, before enabling hugepages, I recommend to check with your OS Vendor Bug Database, test on a test system and apply recent OS upgrades first.

Leave Comment