Linux

Beware of loopback MTU size with RAC on Oracle Linux 7

By Martin | May 12th, 2018 | Category: 11gR2, Linux, Oracle Database | 7 comments

I recently had to troubleshoot a hung instance in a 2 node RAC system. 4 months earlier, the system was reinstalled in a rolling fashion due to the requirement of Linux Upgrade from Oracle Linux 5 to Oracle Linux 7. This was required because of lack of certification for a storage migration to an AllFlash Storage. The system has been stable when running with Oracle Linux 5 for several years. Around 4 months after the reinstallation, one node got hung with and traces showed these error messages:

Mon Feb 26 08:53:37 2018
skgxpvfynet: mtype: 61 process 15801 failed because of a resource problem in the OS. The OS has most likely run out of buffers (rval: 4)
Errors in file /u01/app/oracle/diag_p/diag/rdbms/prod/PROD2/trace/PROD2_ora_15801.trc (incident=480004):
ORA-00603: ORACLE server session terminated by fatal error
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:sendmsg failed with status: 105
ORA-27301: OS failure message: No buffer space available
ORA-27302: failure occurred at: sskgxpsnd2
Incident details in: /u01/app/oracle/diag_p/diag/rdbms/prod/PROD2/incident/incdir_480004/PROD2_ora_15801_i480004.trc

This was strange because OS /proc/meminfo was showing huge amounts of free memory for this physical host with 512GB of RAM.

[root@node17 ~]# cat /proc/meminfo 
zzz ***Mon Feb 26 08:53:09 CET 2018
MemTotal:       528028424 kB
MemFree:        14593828 kB
MemAvailable:   78305772 kB
Buffers:        28009752 kB
Cached:         46896496 kB
SwapCached:            0 kB
Active:         22627436 kB
Inactive:       66945168 kB
Active(anon):   14315300 kB
Inactive(anon):  2105748 kB
Active(file):    8312136 kB
Inactive(file): 64839420 kB
Unevictable:      363996 kB
Mlocked:          364020 kB
SwapTotal:      33554428 kB
SwapFree:       33554428 kB
Dirty:               404 kB
Writeback:             0 kB
AnonPages:      15768480 kB
Mapped:           709896 kB
Shmem:            945384 kB
Slab:            2462644 kB
SReclaimable:    2092232 kB
SUnreclaim:       370412 kB
KernelStack:       28336 kB
PageTables:       395568 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    87853440 kB
Committed_AS:   22087384 kB
VmallocTotal:   34359738367 kB
VmallocUsed:     1106260 kB
VmallocChunk:   34085810172 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
CmaTotal:          16384 kB
CmaFree:               0 kB
HugePages_Total:   204800
HugePages_Free:    20214
HugePages_Rsvd:     6266
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:    36827756 kB
DirectMap2M:    78444544 kB
DirectMap1G:    423624704 kB

Oracle Support then referred to this MOS Note:

ORA-27301: OS Failure Message: No Buffer Space Available / ORA-27302: failure occurred at: sskgxpsnd2 ( Doc ID 2322410.1 )

It turned out that on systems with a lot of physical memory and on Oracle Linux 7, the MTU size of loopback adapter lo0 has to be reduced from the default value of 64k to 16436 bytes to avoid memory fragmentation in RAC. The note also mentioned that the parameter vm.min_free_kbytes should be set to physmem * 0,004 * (here 2). This requirement was already considered during installation of the system, so the only thing missing was the MTU size.

I was very surprised that neither Cluster Verification Utility (CVU), nor orachk in most recent version did catch this problem at the point of installation. In my opinion, if default value of MTU size of loopback interface on Oracle Linux 7 has the potential to cause an outage, then this must be pre-checked by CVU at installation time or at least integrated into orachk. Unfortunately, this is not the case. It seems that in July we will know if the on-prem release 18.3.0 eventually will catch and enforce this configuration requirement during installation time.

Oracle Linux Update from 7.2 to 7.3 crashes with uncorrectable CPU error

By Martin | February 7th, 2017 | Category: Linux | No Comments »

During the Upgrade of Oracle Linux 7.2 to 7.3 with “yum update”, when updating the microcode_ctl package, the system crashed and firmware reported an uncorrectable CPU error after reboot.

It turned out that the update of the package in combination with specific Intel CPUs causes the issue. In the meantime, there is a red hat bug and solution available.

References:

https://community.oracle.com/thread/3997160
https://bugzilla.redhat.com/show_bug.cgi?id=1398698
MOS: The system could not start since suddenly rebooted during upgrade to microcode_ctl package (Doc ID 2217876.1)

Problems with big SGAs (>200G) on Linux

By Martin | October 19th, 2016 | Category: 12cR1, Linux, Oracle Database | No Comments »

I recently had an issue where a database with 240GB SGA (1 huge shared memory segment) configured with hugepages on a system with 512G RAM was suddenly becoming instable and new logons were denied with these error message:

ORA-01034: ORACLE not available ORA-27123: unable to attach to shared memory segment Linux-x86_64 Error: 22: Invalid argument Additional information: 2667 Additional information: 1736718 Additional information: 215016800256

This was strange because ipcs -a showed all shared memory segments and all processes existed. It turned out that there is a known issue with the Linux feature “prelink” and very big SGAs (>200g). This MOS Note gives some details:

Connected to an idle instance, while database instance is running (Process J000 died, see its trace file kkjcre1p: unable to spawn jobq slave process ) (Doc ID 1578491.1)

Obviously, exadata customers also suffered from this issue and therefore prelink is disabled beginning with exadata version 11.2.3.3. Redhat is reporting the same issue in combination with Informix databases in Knowledgebase article: https://access.redhat.com/solutions/1186953

Bug 13397104: INSTANCE CRASH WITH ORA-600 [KJBLPKEYDRMQSCCHK:PKEY]

By Martin | June 1st, 2012 | Category: 11gR2, Bugs, Linux, Oracle Database | No Comments »

I would like to point your attention to a bug in RAC 11.2.0.3. The bug leads to instance crashes during Dynamic Remastering. As of June 1st, there is no patch publicly available. The workaround seems to be to set:

_gc_read_mostly_locking=false

So, if you are thinking about upgrading your RAC to 11.2.0.3, maybe you should wait just a little bit longer… 😉

More details are in MOS: Bug 13397104: INSTANCE CRASH WITH ORA-600 [KJBLPKEYDRMQSCCHK:PKEY]

Update 12.06.2012: Patch 13397104 is now available in MOS for 11.2.0.3.

Oracle certifies OEL6/RHEL6 for Oracle Database 11gR2 and Fusion Middleware

By Martin | March 24th, 2012 | Category: 11gR2, Linux, Oracle Database | No Comments »

Oracle yesterday announced certification of RHEL6 / OEL6 with Oracle Database 11gR2 and Oracle Fusion Middleware 11.1.

Moreover, Oracle will from now on provide errata packages for free. Until now, they could only be obtained with a valid ULN support contract. I think this will be a big plus for Oracle Linux in competition with Red Hat Enterprise Linux.

I wonder how long it will take before they ship Exadata and Oracle Database Appliance (ODA) with Unbreakable Kernel.

Reference: http://www.oracle.com/us/corporate/press/1563775

ACFS Filesystem Monitoring and Group Ownership

By Martin | January 10th, 2011 | Category: 11gR2, Linux, Oracle Database, Oracle Enterprise Manager, Unix | No Comments »

When you create an ACFS Filesystem in Grid Infrastructure 11.2.0.1 or 11.2.0.2, the filesystem root directory group ownership is set to Group SS_ASM_GRP, e.g. asmadmin.

[grid@md1 ~]$ cd /opt/oracle/gridbase/acfsmounts
[grid@md1 acfsmounts]$ ls -al
total 12
drwxr-xr-x  3 grid oinstall 4096 Jan 10 09:44 .
drwxr-xr-x 10 grid oinstall 4096 Jan 10 09:43 ..
drwxrwx---  4 root   asmadm 4096 Jan 10 09:44 data_testvol
 
SQL> select * from v$asm_filesystem where fs_name = '/opt/oracle/gridbase/acfsmounts/data_testvol'
 
FS_NAME                                        AVAILABLE BLOCK_SIZE STATE         CORRU    NUM_VOL TOTAL_SIZE TOTAL_FREE TOTAL_SNAP_SPACE_USAGE
---------------------------------------------- --------- ---------- ------------- ----- ---------- ---------- ---------- ----------------------
/opt/oracle/gridbase/acfsmounts/data_testvol   10-JAN-11          4 AVAILABLE     FALSE          1        256 119.769531                      0

If – for whatever reason – you change the group ownership from asmadm to a different group, ASM can not populate the views v$asm_filesystem and v$asm_acfsvolumes which in turn means that you can not monitor the filesystem with Oracle Enterprise Manager Grid Control because it uses those 2 views for monitoring.

[root@md1 data_testvol]# chgrp myapp .
[root@md1 data_testvol]# ls -la
total 80
drwxrwx--- 4 root   myapp     4096 Jan 10 09:45 .
drwxr-xr-x 3 grid   oinstall  4096 Jan 10 09:44 ..
drwxr-xr-x 5 root   root      4096 Jan 10 09:44 .ACFS
-rw-r--r-- 1 root   asmadm     610 Jan 10 09:45 .fslimit
drwx------ 2 root   root     65536 Jan 10 09:44 lost+found
 
 
SQL> select * from v$asm_filesystem where fs_name = '/opt/oracle/gridbase/acfsmounts/data_testvol'
  2  ;
 
no rows selected

From my point of view, this is a severe limitation. ACFS Filesystems, should like any other filesystem, be able to allow any user/group ownership and still be able to monitor it. However, I could not convince my oracle support engineer to see it the same way…

Grid Control 11g: Diagnosing OMS High CPU Utilization

By Martin | November 11th, 2010 | Category: Linux, Oracle Enterprise Manager | 7 comments

Beginning with 11g Oracle Enterprise Manager Grid Control is using Weblogic instead of Oracle Application Server. I am currently experiencing very sluggish Grid Control Console performance and high CPU Utilization of EMGC_OMS1 weblogic server. This article shows how you can diagnose the problem and “assist” Oracle support in helping to solve the issue.

First step to find out what´s going on is to use “top”.

top - 08:29:12 up 4 days, 20:56,  3 users,  load average: 1.47, 1.60, 1.61
Tasks: 532 total,   1 running, 531 sleeping,   0 stopped,   0 zombie
Cpu(s):  1.3%us,  0.7%sy,  0.0%ni, 97.8%id,  0.2%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  132033652k total, 20255244k used, 111778408k free,   728860k buffers
Swap: 20971512k total,        0k used, 20971512k free, 15391704k cached
 
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
30181 gc11g    15   0 1778m 1.1g  49m S 99.7  0.9 626:54.30 java

PID 30181 is using 100% (and has been using it for a very long time)

It´s the EMGC_OMS1 server:

1
2

gc11g   30181 30129  9 Nov06 ?        10:27:20 /oracle/jdk/jdk1.6.0_21/bin/java -client -Xms256m -Xmx512m -XX:CompileThreshold=8000 -XX:PermSize=128m -XX:MaxPermSize=512m -Dweblogic.Name=EMGC_OMS1 -Djava.security.policy=/oracle/grid/middleware/wlserver_10.3/server/lib/weblogic.policy -Dweblogic.system.BootIdentityFile=/oracle/grid/gc_inst/user_projects/domains/GCDomain/servers/EMGC_OMS1/data/nodemanager/boot.properties -Dweblogic.nodemanager.ServiceEnabled=true -Dweblogic.security.SSL.ignoreHostnameVerification=true -Dweblogic.ReverseDNSAllowed=false -DINSTANCE_HOME=/oracle/grid/gc_inst/em/EMGC_OMS1 -DORACLE_HOME=/oracle/grid/middleware/oms11g -Ddomain.home=/oracle/grid/gc_inst/user_projects/domains/GCDomain -Djava.awt.headless=true -Ddomain.name=GCDomain -Docm.repeater.home=/oracle/grid/middleware/oms11g -Djava.security.egd=file:///dev/urandom -Xverify:none -da -Dplatform.home=/oracle/grid/middleware/wlserver_10.3 -Dwls.home=/oracle/grid/middleware/wlserver_10.3/server -Dweblogic.home=/oracle/grid/middleware/wlserver_10.3/server -Ddomain.home=/oracle/grid/gc_inst/user_projects/domains/GCDomain -Dcommon.components.home=/oracle/grid/middleware/oracle_common -Djrf.version=11.1.1 -Dorg.apache.commons.logging.Log=org.apache.commons.logging.impl.Jdk14Logger -Djrockit.optfile=/oracle/grid/middleware/oracle_common/modules/oracle.jrf_11.1.1/jrocket_optfile.txt -Doracle.domain.config.dir=/oracle/grid/gc_inst/user_projects/domains/GCDomain/config/fmwconfig -Doracle.server.config.dir=/oracle/grid/gc_inst/user_projects/domains/GCDomain/config/fmwconfig/servers/EMGC_OMS1 -Doracle.security.jps.config=/oracle/grid/gc_inst/user_projects/domains/GCDomain/config/fmwconfig/jps-config.xml -Djava.protocol.handler.pkgs=oracle.mds.net.protocol -Digf.arisidbeans.carmlloc=/oracle/grid/gc_inst/user_projects/domains/GCDomain/config/fmwconfig/carml -Digf.arisidstack.home=/oracle/grid/gc_inst/user_projects/domains/GCDomain/config/fmwconfig/arisidprovider -Dweblogic.alternateTypesDirectory=/oracle/grid/middleware/oracle_common/modules/oracle.ossoiap_11.1.1,/oracle/grid/middleware/oracle_common/modules/oracle.oamprovider_11.1.1 -Dweblogic.jdbc.remoteEnabled=false -Dweblogic.management.discover=false -Dweblogic.management.server=https://t44grid.wwk-group.com:7101 -Dwlw.iterativeDev= -Dwlw.testConsole= -Dwlw.logErrorsToConsole= -Dweblogic.ext.dirs=/oracle/grid/middleware/patch_wls1032/profiles/default/sysext_manifest_classpath weblogic.Server
oel5n1:/home/gc11g >

gc11g 30181 30129 9 Nov06 ? 10:27:20 /oracle/jdk/jdk1.6.0_21/bin/java -client -Xms256m -Xmx512m -XX:CompileThreshold=8000 -XX:PermSize=128m -XX:MaxPermSize=512m -Dweblogic.Name=EMGC_OMS1 -Djava.security.policy=/oracle/grid/middleware/wlserver_10.3/server/lib/weblogic.policy -Dweblogic.system.BootIdentityFile=/oracle/grid/gc_inst/user_projects/domains/GCDomain/servers/EMGC_OMS1/data/nodemanager/boot.properties -Dweblogic.nodemanager.ServiceEnabled=true -Dweblogic.security.SSL.ignoreHostnameVerification=true -Dweblogic.ReverseDNSAllowed=false -DINSTANCE_HOME=/oracle/grid/gc_inst/em/EMGC_OMS1 -DORACLE_HOME=/oracle/grid/middleware/oms11g -Ddomain.home=/oracle/grid/gc_inst/user_projects/domains/GCDomain -Djava.awt.headless=true -Ddomain.name=GCDomain -Docm.repeater.home=/oracle/grid/middleware/oms11g -Djava.security.egd=file:///dev/urandom -Xverify:none -da -Dplatform.home=/oracle/grid/middleware/wlserver_10.3 -Dwls.home=/oracle/grid/middleware/wlserver_10.3/server -Dweblogic.home=/oracle/grid/middleware/wlserver_10.3/server -Ddomain.home=/oracle/grid/gc_inst/user_projects/domains/GCDomain -Dcommon.components.home=/oracle/grid/middleware/oracle_common -Djrf.version=11.1.1 -Dorg.apache.commons.logging.Log=org.apache.commons.logging.impl.Jdk14Logger -Djrockit.optfile=/oracle/grid/middleware/oracle_common/modules/oracle.jrf_11.1.1/jrocket_optfile.txt -Doracle.domain.config.dir=/oracle/grid/gc_inst/user_projects/domains/GCDomain/config/fmwconfig -Doracle.server.config.dir=/oracle/grid/gc_inst/user_projects/domains/GCDomain/config/fmwconfig/servers/EMGC_OMS1 -Doracle.security.jps.config=/oracle/grid/gc_inst/user_projects/domains/GCDomain/config/fmwconfig/jps-config.xml -Djava.protocol.handler.pkgs=oracle.mds.net.protocol -Digf.arisidbeans.carmlloc=/oracle/grid/gc_inst/user_projects/domains/GCDomain/config/fmwconfig/carml -Digf.arisidstack.home=/oracle/grid/gc_inst/user_projects/domains/GCDomain/config/fmwconfig/arisidprovider -Dweblogic.alternateTypesDirectory=/oracle/grid/middleware/oracle_common/modules/oracle.ossoiap_11.1.1,/oracle/grid/middleware/oracle_common/modules/oracle.oamprovider_11.1.1 -Dweblogic.jdbc.remoteEnabled=false -Dweblogic.management.discover=false -Dweblogic.management.server=https://t44grid.wwk-group.com:7101 -Dwlw.iterativeDev= -Dwlw.testConsole= -Dwlw.logErrorsToConsole= -Dweblogic.ext.dirs=/oracle/grid/middleware/patch_wls1032/profiles/default/sysext_manifest_classpath weblogic.Server oel5n1:/home/gc11g >

Next step is to find out which thread is consuming 100% of cpu.

oel5n1:/home/gc11g >  ps -Leo pid,ruser,vsz,rss,state,priority,nice,time,%cpu,comm,lwp,psr | grep 30181
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            30181   5
30181 gc11g   1821640 1167200 S 15 0 00:00:03  0.0 java            30182   8
30181 gc11g   1821640 1167200 S 15 0 00:01:02  0.0 java            30183   0
30181 gc11g   1821640 1167200 S 15 0 00:01:02  0.0 java            30184   8
30181 gc11g   1821640 1167200 S 15 0 00:01:02  0.0 java            30185  11
30181 gc11g   1821640 1167200 S 15 0 00:01:02  0.0 java            30186  10
30181 gc11g   1821640 1167200 S 15 0 00:01:02  0.0 java            30187   3
30181 gc11g   1821640 1167200 S 15 0 00:01:02  0.0 java            30188  15
30181 gc11g   1821640 1167200 S 15 0 00:01:02  0.0 java            30189   9
30181 gc11g   1821640 1167200 S 15 0 00:01:02  0.0 java            30190   2
30181 gc11g   1821640 1167200 S 15 0 00:01:02  0.0 java            30191  13
30181 gc11g   1821640 1167200 S 15 0 00:01:02  0.0 java            30192   1
30181 gc11g   1821640 1167200 S 15 0 00:01:02  0.0 java            30193   4
30181 gc11g   1821640 1167200 S 15 0 00:01:02  0.0 java            30194  14
30181 gc11g   1821640 1167200 S 15 0 00:01:02  0.0 java            30195   5
30181 gc11g   1821640 1167200 R 25 0 08:30:49  7.6 java            30196   2       <<<<<<
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            30197  10
30181 gc11g   1821640 1167200 S 15 0 00:00:04  0.0 java            30198   4
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            30199   5
30181 gc11g   1821640 1167200 S 15 0 00:00:58  0.0 java            30200  15
30181 gc11g   1821640 1167200 S 15 0 00:00:57  0.0 java            30201   2
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            30202   5
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            30203  13
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            30204  10
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            30496   4
30181 gc11g   1821640 1167200 S 15 0 00:18:50  0.2 java            30497  11
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            30498  15
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            30499   0
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            30504   8
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            30505  10
30181 gc11g   1821640 1167200 S 15 0 00:00:11  0.0 java            30510  15
30181 gc11g   1821640 1167200 S 15 0 00:00:38  0.0 java            30513  11
30181 gc11g   1821640 1167200 S 15 0 00:00:38  0.0 java            30514  13
30181 gc11g   1821640 1167200 S 15 0 00:00:37  0.0 java            30517  10
30181 gc11g   1821640 1167200 S 15 0 00:00:39  0.0 java            30518  14
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            30529   0
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            30533   8
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            30588   8
30181 gc11g   1821640 1167200 S 15 0 00:00:01  0.0 java            30589   5
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            30590  14
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            30591   4
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            30592   3
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            30899   8
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            30901   9
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            30904   3
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            30907  15
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            30910  12
30181 gc11g   1821640 1167200 S 15 0 00:00:02  0.0 java            31205   2
30181 gc11g   1821640 1167200 S 15 0 00:29:45  0.4 java            31526   4
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            31568  15
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            31593   5
30181 gc11g   1821640 1167200 S 15 0 00:12:02  0.1 java            31600  15
30181 gc11g   1821640 1167200 S 15 0 00:08:27  0.1 java            31601  13
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            32157   4
30181 gc11g   1821640 1167200 S 15 0 00:00:01  0.0 java            32160   2
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            32161  12
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            32163   2
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            32166  11
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            32168   0
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            32169  13
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            32170  10
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            32171  10
30181 gc11g   1821640 1167200 S 16 0 00:00:00  0.0 java            32172   3
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            32173  10
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            32174   8
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            32175  13
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            32176  12
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            32177   0
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            32178   9
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            32179  13
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            32180   9
30181 gc11g   1821640 1167200 S 15 0 00:00:01  0.0 java            32181   6
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            32182   0
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            32183   8
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            32264  15
30181 gc11g   1821640 1167200 S 15 0 00:01:02  0.0 java            32265   8
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            32269   8
30181 gc11g   1821640 1167200 S 15 0 00:00:12  0.0 java            32270   1
30181 gc11g   1821640 1167200 S 15 0 00:00:14  0.0 java            32271   6
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            32275  10
30181 gc11g   1821640 1167200 S 15 0 00:00:15  0.0 java            32280  15
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            32283  12
30181 gc11g   1821640 1167200 S 15 0 00:10:45  0.1 java            32286  13
30181 gc11g   1821640 1167200 S 15 0 00:00:47  0.0 java            32289  12
30181 gc11g   1821640 1167200 S 15 0 00:02:57  0.0 java            32295   0
30181 gc11g   1821640 1167200 S 15 0 00:00:03  0.0 java            32297  12
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            32301   9
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            32302  12
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            32305   5
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            32306  12
30181 gc11g   1821640 1167200 S 15 0 00:01:09  0.0 java            32307   7
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            32308   2
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            32309   0
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            32310   4
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            32311   8
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            32326  13
30181 gc11g   1821640 1167200 S 15 0 00:00:14  0.0 java            32327  11
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            32336  13
30181 gc11g   1821640 1167200 S 15 0 00:00:01  0.0 java            32337   8
30181 gc11g   1821640 1167200 S 15 0 00:00:29  0.0 java            32338   3
30181 gc11g   1821640 1167200 S 15 0 00:00:05  0.0 java            32339   9
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            32340  10
30181 gc11g   1821640 1167200 S 15 0 00:00:18  0.0 java            32341  12
30181 gc11g   1821640 1167200 S 15 0 00:00:02  0.0 java            32342   7
30181 gc11g   1821640 1167200 S 15 0 00:00:29  0.0 java            32343   7
30181 gc11g   1821640 1167200 S 15 0 00:00:10  0.0 java            32344   0
30181 gc11g   1821640 1167200 S 15 0 00:00:05  0.0 java            32351   0
30181 gc11g   1821640 1167200 S 15 0 00:00:02  0.0 java             1943   9
30181 gc11g   1821640 1167200 S 15 0 00:01:19  0.0 java             1944   6
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java             1945  15
30181 gc11g   1821640 1167200 S 15 0 00:00:06  0.0 java             1946  11
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java             2890  15
30181 gc11g   1821640 1167200 S 15 0 00:02:56  0.0 java             7839  14
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            10208   0
30181 gc11g   1821640 1167200 S 15 0 00:00:02  0.0 java            16818  15
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            18310   2
30181 gc11g   1821640 1167200 S 15 0 00:00:09  0.0 java              642  14
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java              690   7
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            27517   8
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            27518   8
30181 gc11g   1821640 1167200 S 15 0 00:00:01  0.0 java            29228   3
30181 gc11g   1821640 1167200 S 15 0 00:02:46  0.1 java            12075  12
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            12160   1
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            12829  13
30181 gc11g   1821640 1167200 S 15 0 00:00:02  0.0 java             2773   7
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            14748  14
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            14749   6
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            14750   1
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            14751   0
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            14752   5
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            14753   5
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            14860  12
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            16706   8
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            17312   1
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            17313  12
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            17425  12
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            17426  15
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            19993  12
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            19994  15
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            19995  12
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java             3575   9
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            16593  14
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            18488   8
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            18489   5
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            21901   7
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java             5572   7
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java             7993   9
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            27543  13
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            32075  15

oel5n1:/home/gc11g > ps -Leo pid,ruser,vsz,rss,state,priority,nice,time,%cpu,comm,lwp,psr | grep 30181 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 30181 5 30181 gc11g 1821640 1167200 S 15 0 00:00:03 0.0 java 30182 8 30181 gc11g 1821640 1167200 S 15 0 00:01:02 0.0 java 30183 0 30181 gc11g 1821640 1167200 S 15 0 00:01:02 0.0 java 30184 8 30181 gc11g 1821640 1167200 S 15 0 00:01:02 0.0 java 30185 11 30181 gc11g 1821640 1167200 S 15 0 00:01:02 0.0 java 30186 10 30181 gc11g 1821640 1167200 S 15 0 00:01:02 0.0 java 30187 3 30181 gc11g 1821640 1167200 S 15 0 00:01:02 0.0 java 30188 15 30181 gc11g 1821640 1167200 S 15 0 00:01:02 0.0 java 30189 9 30181 gc11g 1821640 1167200 S 15 0 00:01:02 0.0 java 30190 2 30181 gc11g 1821640 1167200 S 15 0 00:01:02 0.0 java 30191 13 30181 gc11g 1821640 1167200 S 15 0 00:01:02 0.0 java 30192 1 30181 gc11g 1821640 1167200 S 15 0 00:01:02 0.0 java 30193 4 30181 gc11g 1821640 1167200 S 15 0 00:01:02 0.0 java 30194 14 30181 gc11g 1821640 1167200 S 15 0 00:01:02 0.0 java 30195 5 30181 gc11g 1821640 1167200 R 25 0 08:30:49 7.6 java 30196 2 <<<<<< 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 30197 10 30181 gc11g 1821640 1167200 S 15 0 00:00:04 0.0 java 30198 4 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 30199 5 30181 gc11g 1821640 1167200 S 15 0 00:00:58 0.0 java 30200 15 30181 gc11g 1821640 1167200 S 15 0 00:00:57 0.0 java 30201 2 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 30202 5 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 30203 13 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 30204 10 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 30496 4 30181 gc11g 1821640 1167200 S 15 0 00:18:50 0.2 java 30497 11 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 30498 15 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 30499 0 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 30504 8 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 30505 10 30181 gc11g 1821640 1167200 S 15 0 00:00:11 0.0 java 30510 15 30181 gc11g 1821640 1167200 S 15 0 00:00:38 0.0 java 30513 11 30181 gc11g 1821640 1167200 S 15 0 00:00:38 0.0 java 30514 13 30181 gc11g 1821640 1167200 S 15 0 00:00:37 0.0 java 30517 10 30181 gc11g 1821640 1167200 S 15 0 00:00:39 0.0 java 30518 14 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 30529 0 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 30533 8 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 30588 8 30181 gc11g 1821640 1167200 S 15 0 00:00:01 0.0 java 30589 5 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 30590 14 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 30591 4 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 30592 3 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 30899 8 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 30901 9 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 30904 3 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 30907 15 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 30910 12 30181 gc11g 1821640 1167200 S 15 0 00:00:02 0.0 java 31205 2 30181 gc11g 1821640 1167200 S 15 0 00:29:45 0.4 java 31526 4 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 31568 15 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 31593 5 30181 gc11g 1821640 1167200 S 15 0 00:12:02 0.1 java 31600 15 30181 gc11g 1821640 1167200 S 15 0 00:08:27 0.1 java 31601 13 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 32157 4 30181 gc11g 1821640 1167200 S 15 0 00:00:01 0.0 java 32160 2 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 32161 12 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 32163 2 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 32166 11 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 32168 0 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 32169 13 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 32170 10 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 32171 10 30181 gc11g 1821640 1167200 S 16 0 00:00:00 0.0 java 32172 3 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 32173 10 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 32174 8 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 32175 13 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 32176 12 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 32177 0 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 32178 9 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 32179 13 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 32180 9 30181 gc11g 1821640 1167200 S 15 0 00:00:01 0.0 java 32181 6 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 32182 0 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 32183 8 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 32264 15 30181 gc11g 1821640 1167200 S 15 0 00:01:02 0.0 java 32265 8 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 32269 8 30181 gc11g 1821640 1167200 S 15 0 00:00:12 0.0 java 32270 1 30181 gc11g 1821640 1167200 S 15 0 00:00:14 0.0 java 32271 6 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 32275 10 30181 gc11g 1821640 1167200 S 15 0 00:00:15 0.0 java 32280 15 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 32283 12 30181 gc11g 1821640 1167200 S 15 0 00:10:45 0.1 java 32286 13 30181 gc11g 1821640 1167200 S 15 0 00:00:47 0.0 java 32289 12 30181 gc11g 1821640 1167200 S 15 0 00:02:57 0.0 java 32295 0 30181 gc11g 1821640 1167200 S 15 0 00:00:03 0.0 java 32297 12 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 32301 9 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 32302 12 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 32305 5 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 32306 12 30181 gc11g 1821640 1167200 S 15 0 00:01:09 0.0 java 32307 7 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 32308 2 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 32309 0 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 32310 4 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 32311 8 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 32326 13 30181 gc11g 1821640 1167200 S 15 0 00:00:14 0.0 java 32327 11 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 32336 13 30181 gc11g 1821640 1167200 S 15 0 00:00:01 0.0 java 32337 8 30181 gc11g 1821640 1167200 S 15 0 00:00:29 0.0 java 32338 3 30181 gc11g 1821640 1167200 S 15 0 00:00:05 0.0 java 32339 9 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 32340 10 30181 gc11g 1821640 1167200 S 15 0 00:00:18 0.0 java 32341 12 30181 gc11g 1821640 1167200 S 15 0 00:00:02 0.0 java 32342 7 30181 gc11g 1821640 1167200 S 15 0 00:00:29 0.0 java 32343 7 30181 gc11g 1821640 1167200 S 15 0 00:00:10 0.0 java 32344 0 30181 gc11g 1821640 1167200 S 15 0 00:00:05 0.0 java 32351 0 30181 gc11g 1821640 1167200 S 15 0 00:00:02 0.0 java 1943 9 30181 gc11g 1821640 1167200 S 15 0 00:01:19 0.0 java 1944 6 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 1945 15 30181 gc11g 1821640 1167200 S 15 0 00:00:06 0.0 java 1946 11 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 2890 15 30181 gc11g 1821640 1167200 S 15 0 00:02:56 0.0 java 7839 14 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 10208 0 30181 gc11g 1821640 1167200 S 15 0 00:00:02 0.0 java 16818 15 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 18310 2 30181 gc11g 1821640 1167200 S 15 0 00:00:09 0.0 java 642 14 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 690 7 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 27517 8 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 27518 8 30181 gc11g 1821640 1167200 S 15 0 00:00:01 0.0 java 29228 3 30181 gc11g 1821640 1167200 S 15 0 00:02:46 0.1 java 12075 12 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 12160 1 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 12829 13 30181 gc11g 1821640 1167200 S 15 0 00:00:02 0.0 java 2773 7 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 14748 14 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 14749 6 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 14750 1 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 14751 0 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 14752 5 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 14753 5 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 14860 12 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 16706 8 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 17312 1 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 17313 12 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 17425 12 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 17426 15 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 19993 12 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 19994 15 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 19995 12 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 3575 9 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 16593 14 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 18488 8 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 18489 5 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 21901 7 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 5572 7 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 7993 9 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 27543 13 30181 gc11g 1821640 1167200 S 15 0 00:00:00 0.0 java 32075 15

So, it is thread 30196 of PID 30181. Let´s remove the ” sleeping ” threads with grep.

oel5n1:/home/gc11g >  ps -Leo pid,ruser,vsz,rss,state,priority,nice,time,%cpu,comm,lwp,psr | grep 30181 | grep -v " S "
30181 gc11g   1821640 1167200 R 25 0 08:31:37  7.7 java            30196   6
oel5n1:/home/gc11g >  ps -Leo pid,ruser,vsz,rss,state,priority,nice,time,%cpu,comm,lwp,psr | grep 30181 | grep -v " S "
30181 gc11g   1821640 1167200 R 25 0 08:31:39  7.7 java            30196   2
oel5n1:/home/gc11g >  ps -Leo pid,ruser,vsz,rss,state,priority,nice,time,%cpu,comm,lwp,psr | grep 30181 | grep -v " S "
30181 gc11g   1821640 1167200 R 25 0 08:31:40  7.7 java            30196   2
oel5n1:/home/gc11g >  ps -Leo pid,ruser,vsz,rss,state,priority,nice,time,%cpu,comm,lwp,psr | grep 30181 | grep -v " S "
30181 gc11g   1821640 1167200 R 18 0 08:31:40  7.7 java            30196   2
oel5n1:/home/gc11g >  ps -Leo pid,ruser,vsz,rss,state,priority,nice,time,%cpu,comm,lwp,psr | grep 30181 | grep -v " S "
30181 gc11g   1821640 1167200 R 18 0 08:31:41  7.7 java            30196  15
oel5n1:/home/gc11g >  ps -Leo pid,ruser,vsz,rss,state,priority,nice,time,%cpu,comm,lwp,psr | grep 30181 | grep -v " S "
30181 gc11g   1821640 1167200 R 19 0 08:31:41  7.7 java            30196  15
oel5n1:/home/gc11g >  ps -Leo pid,ruser,vsz,rss,state,priority,nice,time,%cpu,comm,lwp,psr | grep 30181 | grep -v " S "
30181 gc11g   1821640 1167200 R 20 0 08:31:42  7.7 java            30196  13
oel5n1:/home/gc11g >  ps -Leo pid,ruser,vsz,rss,state,priority,nice,time,%cpu,comm,lwp,psr | grep 30181 | grep -v " S "
30181 gc11g   1821640 1167200 R 20 0 08:31:42  7.7 java            30196   9
oel5n1:/home/gc11g >  ps -Leo pid,ruser,vsz,rss,state,priority,nice,time,%cpu,comm,lwp,psr | grep 30181 | grep -v " S "
30181 gc11g   1821640 1167200 R 21 0 08:31:42  7.7 java            30196   9
oel5n1:/home/gc11g >  ps -Leo pid,ruser,vsz,rss,state,priority,nice,time,%cpu,comm,lwp,psr | grep 30181 | grep -v " S "
30181 gc11g   1821640 1167200 R 21 0 08:31:43  7.7 java            30196   9
oel5n1:/home/gc11g >  ps -Leo pid,ruser,vsz,rss,state,priority,nice,time,%cpu,comm,lwp,psr | grep 30181 | grep -v " S "
30181 gc11g   1821640 1167200 R 23 0 08:31:43  7.7 java            30196   9
oel5n1:/home/gc11g >  ps -Leo pid,ruser,vsz,rss,state,priority,nice,time,%cpu,comm,lwp,psr | grep 30181 | grep -v " S "
30181 gc11g   1821640 1167200 R 24 0 08:31:43  7.7 java            30196   9

oel5n1:/home/gc11g > ps -Leo pid,ruser,vsz,rss,state,priority,nice,time,%cpu,comm,lwp,psr | grep 30181 | grep -v " S " 30181 gc11g 1821640 1167200 R 25 0 08:31:37 7.7 java 30196 6 oel5n1:/home/gc11g > ps -Leo pid,ruser,vsz,rss,state,priority,nice,time,%cpu,comm,lwp,psr | grep 30181 | grep -v " S " 30181 gc11g 1821640 1167200 R 25 0 08:31:39 7.7 java 30196 2 oel5n1:/home/gc11g > ps -Leo pid,ruser,vsz,rss,state,priority,nice,time,%cpu,comm,lwp,psr | grep 30181 | grep -v " S " 30181 gc11g 1821640 1167200 R 25 0 08:31:40 7.7 java 30196 2 oel5n1:/home/gc11g > ps -Leo pid,ruser,vsz,rss,state,priority,nice,time,%cpu,comm,lwp,psr | grep 30181 | grep -v " S " 30181 gc11g 1821640 1167200 R 18 0 08:31:40 7.7 java 30196 2 oel5n1:/home/gc11g > ps -Leo pid,ruser,vsz,rss,state,priority,nice,time,%cpu,comm,lwp,psr | grep 30181 | grep -v " S " 30181 gc11g 1821640 1167200 R 18 0 08:31:41 7.7 java 30196 15 oel5n1:/home/gc11g > ps -Leo pid,ruser,vsz,rss,state,priority,nice,time,%cpu,comm,lwp,psr | grep 30181 | grep -v " S " 30181 gc11g 1821640 1167200 R 19 0 08:31:41 7.7 java 30196 15 oel5n1:/home/gc11g > ps -Leo pid,ruser,vsz,rss,state,priority,nice,time,%cpu,comm,lwp,psr | grep 30181 | grep -v " S " 30181 gc11g 1821640 1167200 R 20 0 08:31:42 7.7 java 30196 13 oel5n1:/home/gc11g > ps -Leo pid,ruser,vsz,rss,state,priority,nice,time,%cpu,comm,lwp,psr | grep 30181 | grep -v " S " 30181 gc11g 1821640 1167200 R 20 0 08:31:42 7.7 java 30196 9 oel5n1:/home/gc11g > ps -Leo pid,ruser,vsz,rss,state,priority,nice,time,%cpu,comm,lwp,psr | grep 30181 | grep -v " S " 30181 gc11g 1821640 1167200 R 21 0 08:31:42 7.7 java 30196 9 oel5n1:/home/gc11g > ps -Leo pid,ruser,vsz,rss,state,priority,nice,time,%cpu,comm,lwp,psr | grep 30181 | grep -v " S " 30181 gc11g 1821640 1167200 R 21 0 08:31:43 7.7 java 30196 9 oel5n1:/home/gc11g > ps -Leo pid,ruser,vsz,rss,state,priority,nice,time,%cpu,comm,lwp,psr | grep 30181 | grep -v " S " 30181 gc11g 1821640 1167200 R 23 0 08:31:43 7.7 java 30196 9 oel5n1:/home/gc11g > ps -Leo pid,ruser,vsz,rss,state,priority,nice,time,%cpu,comm,lwp,psr | grep 30181 | grep -v " S " 30181 gc11g 1821640 1167200 R 24 0 08:31:43 7.7 java 30196 9

It is always the same thread which is using 7.7% (~1/16 cores) of the all the CPU´s on the system.

let´s take a pstack:

1	oel5n1:/home/gc11g > pstack 30181

let´s grep for the LWP we know:

oel5n1:/home/gc11g > pstack 30181 | grep -A 20 30196
 
Thread 135 (Thread 0x4024f940 (LWP 30196)):                           <<<<<<<<<<<<<<<<<<
#0  0x00002b7471e17d3f in instanceKlass::oop_follow_contents ()
#1  0x00002b747203e6c3 in objArrayKlass::oop_follow_contents ()
#2  0x00002b7471fee5e3 in MarkSweep::follow_stack ()
#3  0x00002b7472094964 in PSMarkSweep::mark_sweep_phase1 ()
#4  0x00002b7472093859 in PSMarkSweep::invoke_no_policy ()
#5  0x00002b74720a0737 in PSScavenge::invoke ()
#6  0x00002b747206814e in ParallelScavengeHeap::failed_mem_allocate ()
#7  0x00002b747218fbc9 in VM_ParallelGCFailedAllocation::doit ()
#8  0x00002b747219ccfa in VM_Operation::evaluate ()
#9  0x00002b747219c312 in VMThread::evaluate_operation ()
#10 0x00002b747219c583 in VMThread::loop ()
#11 0x00002b747219c08e in VMThread::run ()
#12 0x00002b74720560df in java_start ()
#13 0x0000003130e064a7 in start_thread () from /lib64/libpthread.so.0
#14 0x00000031302d3c2d in clone () from /lib64/libc.so.6
 
 
oel5n1:/home/gc11g > pstack 30181 | grep -A 20 30196
Thread 135 (Thread 0x4024f940 (LWP 30196)):
#0  0x00002b7471e17d3f in instanceKlass::oop_follow_contents ()
#1  0x00002b747203e6c3 in objArrayKlass::oop_follow_contents ()
#2  0x00002b7471fee5e3 in MarkSweep::follow_stack ()
#3  0x00002b7472094964 in PSMarkSweep::mark_sweep_phase1 ()
#4  0x00002b7472093859 in PSMarkSweep::invoke_no_policy ()
#5  0x00002b74720a0737 in PSScavenge::invoke ()
#6  0x00002b747206814e in ParallelScavengeHeap::failed_mem_allocate ()
#7  0x00002b747218fbc9 in VM_ParallelGCFailedAllocation::doit ()
#8  0x00002b747219ccfa in VM_Operation::evaluate ()
#9  0x00002b747219c312 in VMThread::evaluate_operation ()
#10 0x00002b747219c583 in VMThread::loop ()
#11 0x00002b747219c08e in VMThread::run ()
#12 0x00002b74720560df in java_start ()
#13 0x0000003130e064a7 in start_thread () from /lib64/libpthread.so.0
#14 0x00000031302d3c2d in clone () from /lib64/libc.so.6
 
 
 
oel5n1:/home/gc11g > pstack 30181 | grep -A 20 30196
Thread 135 (Thread 0x4024f940 (LWP 30196)):
#0  0x00002b7472046d27 in objArrayKlass::oop_adjust_pointers ()
#1  0x00002b747209549e in PSMarkSweepDecorator::adjust_pointers ()
#2  0x00002b74720967f0 in PSOldGen::adjust_pointers ()
#3  0x00002b7472094c3a in PSMarkSweep::mark_sweep_phase3 ()
#4  0x00002b74720938f6 in PSMarkSweep::invoke_no_policy ()
#5  0x00002b74720a0737 in PSScavenge::invoke ()
#6  0x00002b747206814e in ParallelScavengeHeap::failed_mem_allocate ()
#7  0x00002b747218fbc9 in VM_ParallelGCFailedAllocation::doit ()
#8  0x00002b747219ccfa in VM_Operation::evaluate ()
#9  0x00002b747219c312 in VMThread::evaluate_operation ()
#10 0x00002b747219c583 in VMThread::loop ()
#11 0x00002b747219c08e in VMThread::run ()
#12 0x00002b74720560df in java_start ()
#13 0x0000003130e064a7 in start_thread () from /lib64/libpthread.so.0
#14 0x00000031302d3c2d in clone () from /lib64/libc.so.6
 
 
oel5n1:/home/gc11g > pstack 30181 | grep -A 20 30196
Thread 135 (Thread 0x4024f940 (LWP 30196)):
#0  0x00002b7471fee9d3 in MarkSweep::IsAliveClosure::do_object_b ()
#1  0x00002b7471dfad18 in Hashtable::unlink ()
#2  0x00002b74720949ef in PSMarkSweep::mark_sweep_phase1 ()
#3  0x00002b7472093859 in PSMarkSweep::invoke_no_policy ()
#4  0x00002b74720a0737 in PSScavenge::invoke ()
#5  0x00002b747206814e in ParallelScavengeHeap::failed_mem_allocate ()
#6  0x00002b747218fbc9 in VM_ParallelGCFailedAllocation::doit ()
#7  0x00002b747219ccfa in VM_Operation::evaluate ()
#8  0x00002b747219c312 in VMThread::evaluate_operation ()
#9  0x00002b747219c583 in VMThread::loop ()
#10 0x00002b747219c08e in VMThread::run ()
#11 0x00002b74720560df in java_start ()
#12 0x0000003130e064a7 in start_thread () from /lib64/libpthread.so.0
#13 0x00000031302d3c2d in clone () from /lib64/libc.so.6
 
 
oel5n1:/home/gc11g > pstack 30181 | grep -A 20 30196
Thread 135 (Thread 0x4024f940 (LWP 30196)):
#0  0x00002b7471fee706 in MarkSweep::AdjustPointerClosure::do_oop ()
#1  0x00002b7471dfadb1 in Hashtable::oops_do ()
#2  0x00002b7472094bf9 in PSMarkSweep::mark_sweep_phase3 ()
#3  0x00002b74720938f6 in PSMarkSweep::invoke_no_policy ()
#4  0x00002b74720a0737 in PSScavenge::invoke ()
#5  0x00002b747206814e in ParallelScavengeHeap::failed_mem_allocate ()
#6  0x00002b747218fbc9 in VM_ParallelGCFailedAllocation::doit ()
#7  0x00002b747219ccfa in VM_Operation::evaluate ()
#8  0x00002b747219c312 in VMThread::evaluate_operation ()
#9  0x00002b747219c583 in VMThread::loop ()
#10 0x00002b747219c08e in VMThread::run ()
#11 0x00002b74720560df in java_start ()
#12 0x0000003130e064a7 in start_thread () from /lib64/libpthread.so.0
#13 0x00000031302d3c2d in clone () from /lib64/libc.so.6
 
 
oel5n1:/home/gc11g > pstack 30181 | grep -A 20 30196
Thread 135 (Thread 0x4024f940 (LWP 30196)):
#0  0x00002b7471e179e5 in instanceKlass::adjust_static_fields ()
#1  0x00002b7471e234ad in instanceKlassKlass::oop_adjust_pointers ()
#2  0x00002b747209549e in PSMarkSweepDecorator::adjust_pointers ()
#3  0x00002b74720967f0 in PSOldGen::adjust_pointers ()
#4  0x00002b7472094c3a in PSMarkSweep::mark_sweep_phase3 ()
#5  0x00002b74720938f6 in PSMarkSweep::invoke_no_policy ()
#6  0x00002b74720a0737 in PSScavenge::invoke ()
#7  0x00002b747206814e in ParallelScavengeHeap::failed_mem_allocate ()
#8  0x00002b747218fbc9 in VM_ParallelGCFailedAllocation::doit ()
#9  0x00002b747219ccfa in VM_Operation::evaluate ()
#10 0x00002b747219c312 in VMThread::evaluate_operation ()
#11 0x00002b747219c583 in VMThread::loop ()
#12 0x00002b747219c08e in VMThread::run ()
#13 0x00002b74720560df in java_start ()
#14 0x0000003130e064a7 in start_thread () from /lib64/libpthread.so.0
#15 0x00000031302d3c2d in clone () from /lib64/libc.so.6
 
 
oel5n1:/home/gc11g > pstack 30181 | grep -A 20 30196
Thread 135 (Thread 0x4024f940 (LWP 30196)):
#0  0x00002b7471e17e54 in instanceKlass::oop_follow_contents ()
#1  0x00002b747203e6c3 in objArrayKlass::oop_follow_contents ()
#2  0x00002b7471fee5e3 in MarkSweep::follow_stack ()
#3  0x00002b7472094964 in PSMarkSweep::mark_sweep_phase1 ()
#4  0x00002b7472093859 in PSMarkSweep::invoke_no_policy ()
#5  0x00002b74720a0737 in PSScavenge::invoke ()
#6  0x00002b747206814e in ParallelScavengeHeap::failed_mem_allocate ()
#7  0x00002b747218fbc9 in VM_ParallelGCFailedAllocation::doit ()
#8  0x00002b747219ccfa in VM_Operation::evaluate ()
#9  0x00002b747219c312 in VMThread::evaluate_operation ()
#10 0x00002b747219c583 in VMThread::loop ()
#11 0x00002b747219c08e in VMThread::run ()
#12 0x00002b74720560df in java_start ()
#13 0x0000003130e064a7 in start_thread () from /lib64/libpthread.so.0
#14 0x00000031302d3c2d in clone () from /lib64/libc.so.6
 
 
oel5n1:/home/gc11g > pstack 30181 | grep -A 20 30196
Thread 135 (Thread 0x4024f940 (LWP 30196)):
#0  0x00002b747203e640 in objArrayKlass::oop_follow_contents ()
#1  0x00002b7471fee5e3 in MarkSweep::follow_stack ()
#2  0x00002b7472094964 in PSMarkSweep::mark_sweep_phase1 ()
#3  0x00002b7472093859 in PSMarkSweep::invoke_no_policy ()
#4  0x00002b74720a0737 in PSScavenge::invoke ()
#5  0x00002b747206814e in ParallelScavengeHeap::failed_mem_allocate ()
#6  0x00002b747218fbc9 in VM_ParallelGCFailedAllocation::doit ()
#7  0x00002b747219ccfa in VM_Operation::evaluate ()
#8  0x00002b747219c312 in VMThread::evaluate_operation ()
#9  0x00002b747219c583 in VMThread::loop ()
#10 0x00002b747219c08e in VMThread::run ()
#11 0x00002b74720560df in java_start ()
#12 0x0000003130e064a7 in start_thread () from /lib64/libpthread.so.0
#13 0x00000031302d3c2d in clone () from /lib64/libc.so.6
 
 
 
oel5n1:/home/gc11g > pstack 30181 | grep -A 20 30196
Thread 135 (Thread 0x4024f940 (LWP 30196)):
#0  0x00002b7471d50445 in ConstantPoolCacheEntry::follow_contents ()
#1  0x00002b7471d4f4f6 in constantPoolCacheKlass::oop_follow_contents ()
#2  0x00002b7471fee5e3 in MarkSweep::follow_stack ()
#3  0x00002b7472094964 in PSMarkSweep::mark_sweep_phase1 ()
#4  0x00002b7472093859 in PSMarkSweep::invoke_no_policy ()
#5  0x00002b74720a0737 in PSScavenge::invoke ()
#6  0x00002b747206814e in ParallelScavengeHeap::failed_mem_allocate ()
#7  0x00002b747218fbc9 in VM_ParallelGCFailedAllocation::doit ()
#8  0x00002b747219ccfa in VM_Operation::evaluate ()
#9  0x00002b747219c312 in VMThread::evaluate_operation ()
#10 0x00002b747219c583 in VMThread::loop ()
#11 0x00002b747219c08e in VMThread::run ()
#12 0x00002b74720560df in java_start ()
#13 0x0000003130e064a7 in start_thread () from /lib64/libpthread.so.0
#14 0x00000031302d3c2d in clone () from /lib64/libc.so.6
 
oel5n1:/home/gc11g > pstack 30181 | grep -A 20 30196
Thread 135 (Thread 0x4024f940 (LWP 30196)):
#0  0x00002b7471f8af28 in klassKlass::oop_follow_contents ()
#1  0x00002b7471e226cc in instanceKlassKlass::oop_follow_contents ()
#2  0x00002b7471fee5e3 in MarkSweep::follow_stack ()
#3  0x00002b7472094964 in PSMarkSweep::mark_sweep_phase1 ()
#4  0x00002b7472093859 in PSMarkSweep::invoke_no_policy ()
#5  0x00002b74720a0737 in PSScavenge::invoke ()
#6  0x00002b747206814e in ParallelScavengeHeap::failed_mem_allocate ()
#7  0x00002b747218fbc9 in VM_ParallelGCFailedAllocation::doit ()
#8  0x00002b747219ccfa in VM_Operation::evaluate ()
#9  0x00002b747219c312 in VMThread::evaluate_operation ()
#10 0x00002b747219c583 in VMThread::loop ()
#11 0x00002b747219c08e in VMThread::run ()
#12 0x00002b74720560df in java_start ()
#13 0x0000003130e064a7 in start_thread () from /lib64/libpthread.so.0
#14 0x00000031302d3c2d in clone () from /lib64/libc.so.6

oel5n1:/home/gc11g > pstack 30181 | grep -A 20 30196 Thread 135 (Thread 0x4024f940 (LWP 30196)): <<<<<<<<<<<<<<<<<< #0 0x00002b7471e17d3f in instanceKlass::oop_follow_contents () #1 0x00002b747203e6c3 in objArrayKlass::oop_follow_contents () #2 0x00002b7471fee5e3 in MarkSweep::follow_stack () #3 0x00002b7472094964 in PSMarkSweep::mark_sweep_phase1 () #4 0x00002b7472093859 in PSMarkSweep::invoke_no_policy () #5 0x00002b74720a0737 in PSScavenge::invoke () #6 0x00002b747206814e in ParallelScavengeHeap::failed_mem_allocate () #7 0x00002b747218fbc9 in VM_ParallelGCFailedAllocation::doit () #8 0x00002b747219ccfa in VM_Operation::evaluate () #9 0x00002b747219c312 in VMThread::evaluate_operation () #10 0x00002b747219c583 in VMThread::loop () #11 0x00002b747219c08e in VMThread::run () #12 0x00002b74720560df in java_start () #13 0x0000003130e064a7 in start_thread () from /lib64/libpthread.so.0 #14 0x00000031302d3c2d in clone () from /lib64/libc.so.6 oel5n1:/home/gc11g > pstack 30181 | grep -A 20 30196 Thread 135 (Thread 0x4024f940 (LWP 30196)): #0 0x00002b7471e17d3f in instanceKlass::oop_follow_contents () #1 0x00002b747203e6c3 in objArrayKlass::oop_follow_contents () #2 0x00002b7471fee5e3 in MarkSweep::follow_stack () #3 0x00002b7472094964 in PSMarkSweep::mark_sweep_phase1 () #4 0x00002b7472093859 in PSMarkSweep::invoke_no_policy () #5 0x00002b74720a0737 in PSScavenge::invoke () #6 0x00002b747206814e in ParallelScavengeHeap::failed_mem_allocate () #7 0x00002b747218fbc9 in VM_ParallelGCFailedAllocation::doit () #8 0x00002b747219ccfa in VM_Operation::evaluate () #9 0x00002b747219c312 in VMThread::evaluate_operation () #10 0x00002b747219c583 in VMThread::loop () #11 0x00002b747219c08e in VMThread::run () #12 0x00002b74720560df in java_start () #13 0x0000003130e064a7 in start_thread () from /lib64/libpthread.so.0 #14 0x00000031302d3c2d in clone () from /lib64/libc.so.6 oel5n1:/home/gc11g > pstack 30181 | grep -A 20 30196 Thread 135 (Thread 0x4024f940 (LWP 30196)): #0 0x00002b7472046d27 in objArrayKlass::oop_adjust_pointers () #1 0x00002b747209549e in PSMarkSweepDecorator::adjust_pointers () #2 0x00002b74720967f0 in PSOldGen::adjust_pointers () #3 0x00002b7472094c3a in PSMarkSweep::mark_sweep_phase3 () #4 0x00002b74720938f6 in PSMarkSweep::invoke_no_policy () #5 0x00002b74720a0737 in PSScavenge::invoke () #6 0x00002b747206814e in ParallelScavengeHeap::failed_mem_allocate () #7 0x00002b747218fbc9 in VM_ParallelGCFailedAllocation::doit () #8 0x00002b747219ccfa in VM_Operation::evaluate () #9 0x00002b747219c312 in VMThread::evaluate_operation () #10 0x00002b747219c583 in VMThread::loop () #11 0x00002b747219c08e in VMThread::run () #12 0x00002b74720560df in java_start () #13 0x0000003130e064a7 in start_thread () from /lib64/libpthread.so.0 #14 0x00000031302d3c2d in clone () from /lib64/libc.so.6 oel5n1:/home/gc11g > pstack 30181 | grep -A 20 30196 Thread 135 (Thread 0x4024f940 (LWP 30196)): #0 0x00002b7471fee9d3 in MarkSweep::IsAliveClosure::do_object_b () #1 0x00002b7471dfad18 in Hashtable::unlink () #2 0x00002b74720949ef in PSMarkSweep::mark_sweep_phase1 () #3 0x00002b7472093859 in PSMarkSweep::invoke_no_policy () #4 0x00002b74720a0737 in PSScavenge::invoke () #5 0x00002b747206814e in ParallelScavengeHeap::failed_mem_allocate () #6 0x00002b747218fbc9 in VM_ParallelGCFailedAllocation::doit () #7 0x00002b747219ccfa in VM_Operation::evaluate () #8 0x00002b747219c312 in VMThread::evaluate_operation () #9 0x00002b747219c583 in VMThread::loop () #10 0x00002b747219c08e in VMThread::run () #11 0x00002b74720560df in java_start () #12 0x0000003130e064a7 in start_thread () from /lib64/libpthread.so.0 #13 0x00000031302d3c2d in clone () from /lib64/libc.so.6 oel5n1:/home/gc11g > pstack 30181 | grep -A 20 30196 Thread 135 (Thread 0x4024f940 (LWP 30196)): #0 0x00002b7471fee706 in MarkSweep::AdjustPointerClosure::do_oop () #1 0x00002b7471dfadb1 in Hashtable::oops_do () #2 0x00002b7472094bf9 in PSMarkSweep::mark_sweep_phase3 () #3 0x00002b74720938f6 in PSMarkSweep::invoke_no_policy () #4 0x00002b74720a0737 in PSScavenge::invoke () #5 0x00002b747206814e in ParallelScavengeHeap::failed_mem_allocate () #6 0x00002b747218fbc9 in VM_ParallelGCFailedAllocation::doit () #7 0x00002b747219ccfa in VM_Operation::evaluate () #8 0x00002b747219c312 in VMThread::evaluate_operation () #9 0x00002b747219c583 in VMThread::loop () #10 0x00002b747219c08e in VMThread::run () #11 0x00002b74720560df in java_start () #12 0x0000003130e064a7 in start_thread () from /lib64/libpthread.so.0 #13 0x00000031302d3c2d in clone () from /lib64/libc.so.6 oel5n1:/home/gc11g > pstack 30181 | grep -A 20 30196 Thread 135 (Thread 0x4024f940 (LWP 30196)): #0 0x00002b7471e179e5 in instanceKlass::adjust_static_fields () #1 0x00002b7471e234ad in instanceKlassKlass::oop_adjust_pointers () #2 0x00002b747209549e in PSMarkSweepDecorator::adjust_pointers () #3 0x00002b74720967f0 in PSOldGen::adjust_pointers () #4 0x00002b7472094c3a in PSMarkSweep::mark_sweep_phase3 () #5 0x00002b74720938f6 in PSMarkSweep::invoke_no_policy () #6 0x00002b74720a0737 in PSScavenge::invoke () #7 0x00002b747206814e in ParallelScavengeHeap::failed_mem_allocate () #8 0x00002b747218fbc9 in VM_ParallelGCFailedAllocation::doit () #9 0x00002b747219ccfa in VM_Operation::evaluate () #10 0x00002b747219c312 in VMThread::evaluate_operation () #11 0x00002b747219c583 in VMThread::loop () #12 0x00002b747219c08e in VMThread::run () #13 0x00002b74720560df in java_start () #14 0x0000003130e064a7 in start_thread () from /lib64/libpthread.so.0 #15 0x00000031302d3c2d in clone () from /lib64/libc.so.6 oel5n1:/home/gc11g > pstack 30181 | grep -A 20 30196 Thread 135 (Thread 0x4024f940 (LWP 30196)): #0 0x00002b7471e17e54 in instanceKlass::oop_follow_contents () #1 0x00002b747203e6c3 in objArrayKlass::oop_follow_contents () #2 0x00002b7471fee5e3 in MarkSweep::follow_stack () #3 0x00002b7472094964 in PSMarkSweep::mark_sweep_phase1 () #4 0x00002b7472093859 in PSMarkSweep::invoke_no_policy () #5 0x00002b74720a0737 in PSScavenge::invoke () #6 0x00002b747206814e in ParallelScavengeHeap::failed_mem_allocate () #7 0x00002b747218fbc9 in VM_ParallelGCFailedAllocation::doit () #8 0x00002b747219ccfa in VM_Operation::evaluate () #9 0x00002b747219c312 in VMThread::evaluate_operation () #10 0x00002b747219c583 in VMThread::loop () #11 0x00002b747219c08e in VMThread::run () #12 0x00002b74720560df in java_start () #13 0x0000003130e064a7 in start_thread () from /lib64/libpthread.so.0 #14 0x00000031302d3c2d in clone () from /lib64/libc.so.6 oel5n1:/home/gc11g > pstack 30181 | grep -A 20 30196 Thread 135 (Thread 0x4024f940 (LWP 30196)): #0 0x00002b747203e640 in objArrayKlass::oop_follow_contents () #1 0x00002b7471fee5e3 in MarkSweep::follow_stack () #2 0x00002b7472094964 in PSMarkSweep::mark_sweep_phase1 () #3 0x00002b7472093859 in PSMarkSweep::invoke_no_policy () #4 0x00002b74720a0737 in PSScavenge::invoke () #5 0x00002b747206814e in ParallelScavengeHeap::failed_mem_allocate () #6 0x00002b747218fbc9 in VM_ParallelGCFailedAllocation::doit () #7 0x00002b747219ccfa in VM_Operation::evaluate () #8 0x00002b747219c312 in VMThread::evaluate_operation () #9 0x00002b747219c583 in VMThread::loop () #10 0x00002b747219c08e in VMThread::run () #11 0x00002b74720560df in java_start () #12 0x0000003130e064a7 in start_thread () from /lib64/libpthread.so.0 #13 0x00000031302d3c2d in clone () from /lib64/libc.so.6 oel5n1:/home/gc11g > pstack 30181 | grep -A 20 30196 Thread 135 (Thread 0x4024f940 (LWP 30196)): #0 0x00002b7471d50445 in ConstantPoolCacheEntry::follow_contents () #1 0x00002b7471d4f4f6 in constantPoolCacheKlass::oop_follow_contents () #2 0x00002b7471fee5e3 in MarkSweep::follow_stack () #3 0x00002b7472094964 in PSMarkSweep::mark_sweep_phase1 () #4 0x00002b7472093859 in PSMarkSweep::invoke_no_policy () #5 0x00002b74720a0737 in PSScavenge::invoke () #6 0x00002b747206814e in ParallelScavengeHeap::failed_mem_allocate () #7 0x00002b747218fbc9 in VM_ParallelGCFailedAllocation::doit () #8 0x00002b747219ccfa in VM_Operation::evaluate () #9 0x00002b747219c312 in VMThread::evaluate_operation () #10 0x00002b747219c583 in VMThread::loop () #11 0x00002b747219c08e in VMThread::run () #12 0x00002b74720560df in java_start () #13 0x0000003130e064a7 in start_thread () from /lib64/libpthread.so.0 #14 0x00000031302d3c2d in clone () from /lib64/libc.so.6 oel5n1:/home/gc11g > pstack 30181 | grep -A 20 30196 Thread 135 (Thread 0x4024f940 (LWP 30196)): #0 0x00002b7471f8af28 in klassKlass::oop_follow_contents () #1 0x00002b7471e226cc in instanceKlassKlass::oop_follow_contents () #2 0x00002b7471fee5e3 in MarkSweep::follow_stack () #3 0x00002b7472094964 in PSMarkSweep::mark_sweep_phase1 () #4 0x00002b7472093859 in PSMarkSweep::invoke_no_policy () #5 0x00002b74720a0737 in PSScavenge::invoke () #6 0x00002b747206814e in ParallelScavengeHeap::failed_mem_allocate () #7 0x00002b747218fbc9 in VM_ParallelGCFailedAllocation::doit () #8 0x00002b747219ccfa in VM_Operation::evaluate () #9 0x00002b747219c312 in VMThread::evaluate_operation () #10 0x00002b747219c583 in VMThread::loop () #11 0x00002b747219c08e in VMThread::run () #12 0x00002b74720560df in java_start () #13 0x0000003130e064a7 in start_thread () from /lib64/libpthread.so.0 #14 0x00000031302d3c2d in clone () from /lib64/libc.so.6

I´m not much of a Java guy, but this tells me that the thread is doing ParallelGC. So, the issue might be related to a configuration of too small Heap size. (Xmx) I will talk with support to decide whether they support to have Xmx increased to 1G or 2G. In case the cpu consumption is not for Garbage Collection but some other OEM code, then there is a command “emctl dump omsthread”, which can be used:

oel5n1:/home/gc11g > emctl dump omsthread
Oracle Enterprise Manager 11g Release 1 Grid Control
Copyright (c) 1996, 2010 Oracle Corporation.  All rights reserved.
Thread dumped successfully for process '30181' to log file 
/oracle/grid/gc_inst/user_projects/domains/GCDomain/servers/EMGC_OMS1/logs/EMGC_OMS1.out

If we calculate LWP 30196 from dec to hex, we get 75f5, this is NID from OMS thread dump.

"Reference Handler" daemon prio=10 tid=0x000000004fdff000 nid=0x75f5 waiting for monitor entry [0x0000000040725000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:108)
        - waiting to lock <0x00002aaacf4967a0> (a java.lang.ref.Reference$Lock)

Update 26.11.2010:

Oracle support has recommended to increase Xms/Xmx to 1024m and check whether that is sufficient. This has to be changed in file gc_inst/user_projects/domains/GCDomain/bin/setDomainEnv.sh and in those variables for Linux 64bit:

XMS_SUN_64BIT="1024"
XMX_SUN_64BIT="1024"

Upgrade/Install of Grid Infrastructure to 11.2.0.2 is failing

By Martin | September 29th, 2010 | Category: 11gR2, Linux, Unix | 13 comments

I have come across a quite difficult issue when trying to install/upgrade to Grid Infrastructure 11.2.0.2 on Linux. We have performed the following trials:

Installation of 11.1.0.6 to have Cluster Files (OCR/Votedisk) outside of ASM, then upgrade to 11.2.0.1. After that, we installed all the latest patches (PSU 11.2.0.1.2). Firstly we considered running RDBMS 11.1.0.7 in a 11.2 Grid Infrastructure cluster. However, there are quite a few issues when running this mixed setup. For most of them, patches or workarounds do exist. (see Note 948456.1) However, during failover testing, we realized that the remote database instance does not survive the failure of the public lan of the local node. This is fixed in 11.2.0.2 which motivated us to upgrade.
We tried to upgrade our GI 11.2.0.1.2 installation to 11.2.0.2. Everything went fine until it failed when running rootupgrade.sh on the last (second) node. Here, the second node could not join the cluster. Important to know is that 11.2.0.2 brings the new feature HAIP (High available IP), which allows to have multiple private interconnect interfaces (each with a different private interconnect network ip) to be used for failover/load balancing by Grid Infrastructure. For this feature, Oracle uses Multicast on network “230.0.1.0”. This is stated in the updated version of the README for Oracle Database 11.2: http://download.oracle.com/docs/cd/E11882_01/readmes.112/e17129/toc.htm#CHDIEHCH

ACFS-9309: ADVM/ACFS installation correctness verified.
Failed to start Oracle Clusterware stack
Failed to start Cluster Synchorinisation Service in clustered mode at
/opt/oracle/grid11/11.2.0.2/grid/crs/install/crsconfig_lib.pm line 1016.
/opt/oracle/grid11/11.2.0.2/grid/perl/bin/perl -I/opt/oracle/grid11/11.2.0.2/grid/perl/lib -
I/opt/oracle/grid11/11.2.0.2/grid/crs/install
/opt/oracle/grid11/11.2.0.2/grid/crs/install/rootcrs.pl execution failed

The next try was to remove everything and directly install 11.2.0.1 with cluster files stored inside of ASM. This worked fine and we then proceeded in patching with latest PSU 11.2.0.1.2. Then, we gave it another try to upgrade to 11.2.0.2. The same problem occurred again at rootupgrade.sh on second node.
Then we decided to remove everything again and try a direct installation of 11.2.0.2 without any upgrade. This also failed at root.sh on last node.

With the hints of several colleagues from OTN forum (http://forums.oracle.com/forums/messageview.jspa?messageID=5393319&stqc=true) we realized that it might be related to multicast setup. We verified that multicast works with the script from MetaLink Note 413783.1. However this script does not take into account that we want multicasting via private interconnect interface instead of public LAN.

I realized that it might be necessary to add a network route to our system to direct multicast to the private interconnect network:

e.g.: /sbin/route add -net 224.0.0.0 netmask 240.0.0.0 dev bond1

After doing that, the multicast test script from MOS succeeded.

starting receiver:

[grid11@Node2 ]$ java MultiCastTestReceive

starting sender:

[grid11@Node1 ]$ java MultiCastTestSend
Sent 10 bytes to /230.0.1.0

message received by receiver:

Received data from: /10.128.128.1:13139 with length: 10

Please note that in this case 10.128.128.1 is private interconnect ip. Although it is currently just a strong suspicion, I am quite positive that this was the problem and we will consider retrying the upgrade if time allows and will report the results.

Unfortunately, Oracle support was of no help at all with this problem.

Oracle Clusterware / ASM 11.1.0.7: ASM Instance crash

By Martin | October 28th, 2009 | Category: 11g, Bugs, Linux, MetaLink, Oracle Database, Solaris, Unix | No Comments »

During a 11gR1 Clusterware installation for a Single Instance Failover Cluster at a customer site, I have experienced an interesting behaviour, which was caused by an Oracle Bug.

The environment was:

2 Node Oracle Clusterware 11.1.0.7 Cluster on Linux x86_64 using latest Recommended Patches as of October 19th. (pre PSU 11.1.0.7.1)
Clusterware installed as unix user crs
ASM installed as unix user oracle

The ASM instances could be started with SQL*Plus without any problems, but if the ASM instances were started by means of clusterware using srvctl (either from root, crs or oracle) the ASM instances would crash at diskgroup mount with:

ORA-07445: exception encountered: core dump  [sskgds_find_rtn_hdr()+1171]
[SIGBUS] [ADDR:0x2AACD701342C] [PC:0x25799DF]
[Non-existent physical address] []

Oracle Support identified this behaviour as Bug 6952915, for which there are patches for Linux x86, x86_64 and Solaris Sparc64.

Oracle Database version 11g Release 2 for Linux finally released

By Martin | September 2nd, 2009 | Category: 11g, Linux, Oracle Database | No Comments »

Yesterday, on September 1st, Oracle released the much awaited database version 11gR2 for Linux x86 for both 32bit and 64bit platforms. The software can be downloaded via OTN: http://www.oracle.com/technology/software/products/database/index.html.