Oracle Enterprise Manager

Patch Recommendations for Oracle Database 12cR1 and Cloud Control 13cR2

During my consulting engagements I see a lot of systems and many bugs. Most of the time, there is already a patch available to avoid the bug. I have collected all the recommended patches for the Oracle Database 12.1.0.2 (SE2 and EE) and Oracle Enterprise Manager 13cR2. This should help to avoid most critical known issues. Versions 12.2.0.1 and 18c will be added later this year.

Patch Recommendations

Update: 18.05.2018: I have added a list of recommended patches for release 12.2.0.1.
Update: 17.07.2018: Updated EM 13.2 Patches



Cloud Control – Privilege Delegation – so you don´t have the oracle / root password?

Quite frequently in database environments, security policies dictate that only personalized logons to Unix / Linux are allowed and that from there, one has to “sudo” to change to the oracle account. While this adds an additional layer of security, it makes administration a little more complicated.

Oracle Enterprise Manager – Cloud Control has a feature, which allows to cope with such a sudo environment. The feature is called “Privilege Delegation”. This post describes how to set it up and for what it can be used.

  1. Setup of “sudo” by the root account
  2. In order to use privilege delegation, specific sudo rules have to be defined. This rule normally already exists:

    1
    
    mdecker ALL=(root) NOPASSWD:/bin/su - oracle

    In addition to this one, two additional rules are required. If only sudo to “oracle” is required, then only the first line is needed.

    1
    2
    
    mdecker ALL=(oracle) SETENV:/u01/app/oracle/cloud/agent/sbin/nmosudo *
    mdecker ALL=(root) SETENV:/u01/app/oracle/cloud/agent/sbin/nmosudo *
  3. Setup of Privilege Delegation in Cloud Control
  4. Go to Setup -> Security -> Privilege Delegation.

    Then you can either set it globally via template or individually for each host. This depends mainly on the path to “sudo” binary in the operating system. Then you choose “sudo” and provide the path to the sudo binary on your operating system (bash$ which sudo). The required parameters are then appended: /usr/bin/sudo -E -u %RUNAS% %COMMAND%

  5. The next step is to configure a Named Credential.
  6. If you are in a team of administrators, then each administrator should have his own account to log on to Cloud Contol and avoid using “sysman” user. For obvious reasons, each administrator has to create his own “named credential” (Setup->Security->Named Credential), because it contains his personalized username and password.

    Here you provide your personalized credentials (username/password) and specify that sudo should be used to change to “oracle”.

  7. Lastly, verify that it works as desired.
  8. Go to “Targets” -> “Host” and click “Run Host Command”. Then give the command to run, e.g. “id -a”, and then add a named credential as well as a specific host and click “Run”.

    If all is well, then you will get this output showing you the id of user oracle.

How does this work behind the scenes? The agent java process spawns a process “nmo”. This process was granted SETUID root Privileges by executing $ORACLE_HOME/root.sh at the time of agent deployment. This executable is calling “sudo” to run command “nmosudo” as user oracle with the “payload” command, which the user wanted to execute.

1
2
3
root   14595  3149  0 14:23 pts/0    00:00:00 /u01/app/oracle/cloud/agent/sbin/nmo
root   14598 14595  0 14:23 pts/1    00:00:00   /usr/bin/sudo -p ###AGENT-PDP-PASSWORD-PROMPT### -E -u oracle /u01/app/oracle/cloud/agent/sbin/nmosudo DEFAULT_PLUGIN DEFAULT_FUNCTIONALITY DEFAULT_SUBACTION DEFAULT_ACTION /bin/sh -c id -a
oracle 14602 14598  0 14:23 pts/1    00:00:00     sleep 300

The linux logfile /var/log/secure will contain this messages. It can be seen that the personal user “mdecker” was running the command “nmosudo” with the payload command “id -a” as attribute.

1
Sep 13 22:03:59 xxx sudo:  mdecker : TTY=pts/2 ; PWD=/u01/app/oracle/cloud/agent/agent_inst/sysman/emd ; USER=oracle ; COMMAND=/u01/app/oracle/cloud/agent/sbin/nmosudo DEFAULT_PLUGIN DEFAULT_FUNCTIONALITY DEFAULT_SUBACTION DEFAULT_ACTION /bin/sh -c id -a


EM 13c: Do not change OPatch

In previous versions, it was “best practice” to always get the most current opatch (patch 6880880) from MOS. Unfortunately, with Enterprise Manager Cloud Control 13c, this is problematic at the moment. The reason is that OMS 13.1 is shipped with OPatch 13.6:

[oracle@em13c ~]$ opatch version
OPatch Version: 13.6.0.0.0
 
OPatch succeeded.

Currently, OPatch 13.6 is not available in MOS. Only OUI Nextgen 13.2 is available, but not opatch 13.6. Do not overwrite OPatch 13.6 with OUI Nextgen 13.2 because it will break it.



EM12c: opatchauto failed with error code 231

When trying to patch OMS 12.1.0.5 in a VirtualBox environment with latest OMS PSU, I came across a strange problem which took quite a while to solve. The opatchauto apply and also -analyze commands failed every time with this error after several minutes of hanging:

opatchauto failed with error code 231

Manual connect to WLS console with relevant protocol/host/port/username/password was working fine. Then I realized that there was an issue with the entropy pool on the VirtualBox VM.

I followed this Note to resolve the isse:

E1: OS: Linux Servers Hang or Have Delays on Any JAVA Process Affecting Performance (Doc ID 1525645.1)

After implementing rngd, the patching worked successfully without any hangs.



AWR Warehouse – security issue

During implementation of AWR Warehouse, I discovered that AWR warehouse is using temporary staging schemas in the AWR warehouse repository database. These schemas life approximately for the duration of a datapump import job and are then dropped again. Due to the fact that the used password is not compliant with customers password verification function, the jobs failed.

v_sql := ‘ CREATE USER ‘ || STAGING_SCHEMA || ‘ IDENTIFIED BY SYS_GUID ‘ ||
‘ DEFAULT TABLESPACE ‘ || tbsname;

The staging schemas are created with the password “SYS_GUID” in capital lettters. This looks to me like the developer tried to generate a random string as password but instead overlooked that the password is set to fixed string “SYS_GUID” instead.

Oracle support has noticed this issue and filed an enhancement request. It is a pity, that this is not filed as a bug, but an enhancement.

Well, I hope this improves in a future version together with dynamic retention and purging options as well as customizable staging directories.

Happy AWR´ing.



Using Grid Control Repository for RDBMS Patch Report

I was looking for a method to utilize the Grid Control Repository, which contains information about installed Oracle Homes, databases and patches, for a patch report.

With a little reverse engineering i came up with these relevant tables:

  • Mgmt_Ecm_Snapshot: Every time the inventory is refreshed, a new line is inserted into this table containing the host name and the snapshot_guid. The most current snapshot has flag IS_CURRENT set to ‘Y’
  • Mgmt_Inv_Container: Every Oracle Home is a container. This table contains the snapshot_guid and the container_guid along with a container_description which is basically the Oracle Home Path
  • Mgmt_Inv_Patch: This table contains container_guid, Patch ID and patch installation timestamp
  • Mgmt_Inv_Component: This table lists all the components of the oracle homes along with their version. There is one component per container with the flag Is_Top_Level set to ‘Y’.  We use this component for getting the base version of the installed product. (e.g. 11.2.0.2)
  • Mgmt_Inv_Versioned_Patch: I am not sure if this table is needed for version information, but one of the mgmt views was using these two tables together, so I used it as a reference.

The complete statement now is:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
CREATE OR REPLACE FORCE VIEW "SYSMAN"."RDBMS_PATCH_REPORT"
AS
SELECT
      N.Target_Name,
    S.Start_Timestamp AS Collected_Time,
    S.Target_Name     AS Host_Name,
    C.Container_Name  AS Oracle_Home_Name,
    Container_Location,
    P.Id AS Patch_Id,
    (
    CASE Id
      WHEN '10157506'
      THEN 'GI Bundle1'
      WHEN '10185523'
      THEN 'OWB Bundle'
      WHEN '10248523'
      THEN 'PSU Jan 2011'
      WHEN '11724916'
      THEN 'PSU Apr 2011'
      WHEN '12311357'
      THEN 'GI Psu Apr 2011'
      ELSE NULL
    END ) description,
    P.Timestamp AS Install_Time,
    CASE
      WHEN VP.version IS NULL
      THEN M.version
      ELSE VP.version
    END AS Version
  FROM Mgmt_Ecm_Snapshot S,
    Mgmt_Inv_Container C,
    Mgmt_Inv_Patch P,
    (SELECT T.Target_Guid,
      T.Host_Name,
      T.Target_Name,
      T.Target_Type,
      Mp.Property_Value AS Oh
    FROM Mgmt_Targets T,
      Mgmt_Target_Properties Mp
    WHERE T.Target_Guid  = Mp.Target_Guid
    AND Mp.Property_Name = 'OracleHome'
    AND Target_Type      = 'oracle_database'
    ) N,
    Mgmt_Inv_Component M ,
    MGMT_INV_VERSIONED_PATCH VP
  WHERE S.Snapshot_Guid = C.Snapshot_Guid
  AND S.Is_Current      = 'Y'
  AND C.Container_Type  = 'O'
  AND P.Container_Guid  = C.Container_Guid
  AND N.Host_Name       = S.Target_Name
  AND N.Oh              = C.Container_Location
  AND M.Component_Guid  = Vp.Component_Guid(+)
  AND M.Is_Top_Level    = 'Y'
  AND M.Container_Guid  = C.Container_Guid
  ORDER BY 1,2,3;


ACFS Filesystem Monitoring and Group Ownership

When you create an ACFS Filesystem in Grid Infrastructure 11.2.0.1 or 11.2.0.2, the filesystem root directory group ownership is set to Group SS_ASM_GRP, e.g. asmadmin.

1
2
3
4
5
6
7
8
9
10
11
12
[grid@md1 ~]$ cd /opt/oracle/gridbase/acfsmounts
[grid@md1 acfsmounts]$ ls -al
total 12
drwxr-xr-x  3 grid oinstall 4096 Jan 10 09:44 .
drwxr-xr-x 10 grid oinstall 4096 Jan 10 09:43 ..
drwxrwx---  4 root   asmadm 4096 Jan 10 09:44 data_testvol
 
SQL> select * from v$asm_filesystem where fs_name = '/opt/oracle/gridbase/acfsmounts/data_testvol'
 
FS_NAME                                        AVAILABLE BLOCK_SIZE STATE         CORRU    NUM_VOL TOTAL_SIZE TOTAL_FREE TOTAL_SNAP_SPACE_USAGE
---------------------------------------------- --------- ---------- ------------- ----- ---------- ---------- ---------- ----------------------
/opt/oracle/gridbase/acfsmounts/data_testvol   10-JAN-11          4 AVAILABLE     FALSE          1        256 119.769531                      0

If – for whatever reason – you change the group ownership from asmadm to a different group, ASM can not populate the views v$asm_filesystem and v$asm_acfsvolumes which in turn means that you can not monitor the filesystem with Oracle Enterprise Manager Grid Control because it uses those 2 views for monitoring.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
[root@md1 data_testvol]# chgrp myapp .
[root@md1 data_testvol]# ls -la
total 80
drwxrwx--- 4 root   myapp     4096 Jan 10 09:45 .
drwxr-xr-x 3 grid   oinstall  4096 Jan 10 09:44 ..
drwxr-xr-x 5 root   root      4096 Jan 10 09:44 .ACFS
-rw-r--r-- 1 root   asmadm     610 Jan 10 09:45 .fslimit
drwx------ 2 root   root     65536 Jan 10 09:44 lost+found
 
 
SQL> select * from v$asm_filesystem where fs_name = '/opt/oracle/gridbase/acfsmounts/data_testvol'
  2  ;
 
no rows selected

From my point of view, this is a severe limitation. ACFS Filesystems, should like any other filesystem, be able to allow any user/group ownership and still be able to monitor it. However, I could not convince my oracle support engineer to see it the same way…



Grid Control Agent 11.1: High Virtual Memory Consumption

There is a known issue in Grid Control 11.1 Agents, which causes huge Virtual Memory Consumption. I have experienced virtual memory consumption of up to 5.6 GB for the emagent process:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
Resources PID: 14103, emagent PPID: 13773 euid: 172 User:agent
--------------------------------------------------------------------------------
CPU Usage (util): 2.5 Log Reads : 14 Wait Reason : OTHER
User/Nice/RT CPU: 2.5 Log Writes: 0 Total RSS/VSS :546.5mb/ 5.68gb
 
Regions PID: 14103, emagent PPID: 13773 euid: 172 User:agent
 
Type RefCt RSS VSS Locked File Name
--------------------------------------------------------------------------------
NULLDR/Shared 531 4kb 4kb 0kb <nulldref>
MEMMAP/Shared 2 8kb 32kb 0kb /var/.../14103
TEXT /Shared 2 12kb 12kb 0kb /opt/.../bin/emagent
DATA /Priv 1 131.4mb 144.0mb 0kb /opt/.../bin/emagent
MEMMAP/Priv 1 52kb 4.0mb 0kb <mmap>
UAREA /Priv 1 64kb 72kb 0kb <uarea>
UAREA /Priv 1 64kb 72kb 0kb <uarea>
UAREA /Priv 1 64kb 72kb 0kb <uarea>
UAREA /Priv 1 64kb 72kb 0kb <uarea>
UAREA /Priv 1 64kb 72kb 0kb <uarea>
MEMMAP/Priv 1 52kb 4.0mb 0kb <mmap>
UAREA /Priv 1 64kb 72kb 0kb <uarea>
UAREA /Priv 1 64kb 72kb 0kb <uarea>
UAREA /Priv 1 64kb 72kb 0kb <uarea>
UAREA /Priv 1 64kb 72kb 0kb <uarea>
UAREA /Priv 1 64kb 72kb 0kb <uarea>
UAREA /Priv 1 64kb 72kb 0kb <uarea>
MEMMAP/Priv 1 356.0mb 5.32gb 0kb <mmap>

The workaround for this issue is to set a Heap Size Limit of 512M in $AGENT_HOME/sysman/config/emd.properties:

append the String “-Xmx512m” to the line agentJavaDefines.

So the line looks like:

agentJavaDefines=-Djava.awt.headless=true -Dsun.lang.ClassLoader.allowArraySyntax=true -Dnetworkaddress.cache.ttl=1800 -DUrlTiming.UseJSSE=true -Doracle.dms.refresh.wait.time=1000 -Xmx512m

This is the bug, which is platform-independet:
Bug 9829732: AGENT 11.1.0.1 RUNNING ON OMS BOX IS CONSUMING HIGH MEMORY



Grid Control 11g: Agent Metric Swap Utilization on HP-UX with Pseudo-Swap

If you are running Grid Control on HP-UX and you are using pseudo-swap, then you have to add the property:

NMUPM_USE_PSEUDO_MEM=TRUE

to the emd.properties. Otherwise, Swap Utilization is calculated wrongly (higher than acutal) and it might trigger false alarms.

How is the Grid Control Agent calculating Swap Utilization?

In file $AGENT_HOME/sysman/admin/metadata/host.xml line 310 you find some answers:

310
311
<ColumnDescriptor NAME="swapUtil" TYPE="NUMBER" IS_KEY="FALSE" 
COMPUTE_EXPR="(100.0 * usedSwap / (usedSwap + freeSwap))">

In other parts, you find the definition of “usedSwap” and “freeSwap”.

310
311
312
313
 <ColumnDescriptor NAME="usedSwap" TYPE="NUMBER" IS_KEY="FALSE" 
COMPUTE_EXPR="(usedSwapRaw / 1024.0)" HELP="NO_HELP">
 <ColumnDescriptor NAME="freeSwap" TYPE="NUMBER" IS_KEY="FALSE" 
COMPUTE_EXPR="(freeSwapRaw / 1024.0)" HELP="NO_HELP">

Now, we have to find “usedSwapRaw” and “freeSwapRaw”. In a different part (line 387) of this file, you see this section:

387
388
389
390
391
392
393
394
395
396
397
398
399
<GetView NAME="LoadInternalView" FROM_TABLE="_LoadInternal">
                        <Column NAME="cpuLoad_1min" COLUMN_NAME="cpuLoad_1min"/>
                        <Column NAME="cpuLoad" COLUMN_NAME="cpuLoad"/>
                        <Column NAME="cpuLoad_15min" COLUMN_NAME="cpuLoad_15min"/>
                        <Column NAME="pgScan" COLUMN_NAME="pgScan"/>
                        <Column NAME="noOfProcs" COLUMN_NAME="noOfProcs"/>
                        <Column NAME="transfers" COLUMN_NAME="transfers"/>
                        <Column NAME="pageSize" COLUMN_NAME="pageSize"/>
                        <Column NAME="realMem" COLUMN_NAME="realMem"/>
                        <Column NAME="freeMemRaw" COLUMN_NAME="freeMemRaw"/>
                        <Column NAME="usedSwapRaw" COLUMN_NAME="usedSwapRaw"/>
                        <Column NAME="freeSwapRaw" COLUMN_NAME="freeSwapRaw"/>
        </GetView>

In the “Metric” XML Tag for Metric NAME=”_LoadInternal”, you find the QueryDescriptor:

620
621
622
623
624
625
626
627
628
629
 <QueryDescriptor FETCHLET_ID="OSLineToken">
      <Property NAME="NMUPM_USE_PSEUDO_MEM" SCOPE="INSTANCE" OPTIONAL="TRUE">NMUPM_USE_PSEUDO_MEM</Property>
      <Property NAME="ENVNMUPM_USE_PSEUDO_MEM" SCOPE="GLOBAL">%NMUPM_USE_PSEUDO_MEM%</Property>
      <Property NAME="emdRoot" SCOPE="SYSTEMGLOBAL">emdRoot</Property>
      <Property NAME="command" SCOPE="GLOBAL"> %emdRoot%/bin/nmupm </Property>
      <Property NAME="args" SCOPE="GLOBAL">osLoad</Property>
      <Property NAME="startsWith" SCOPE="GLOBAL">em_result=</Property>
      <Property NAME="delimiter" SCOPE="GLOBAL">|</Property>
      <Property NAME="ENVNMUPM_TIMEOUT" OPTIONAL="TRUE" SCOPE="SYSTEMGLOBAL">NMUPM_TIMEOUT</Property>
    </QueryDescriptor>

So, now we know that the agent executes “nmupm osLoad”. Let´s try it.

1
2
3
$ nmupm osLoad
ncpus=8
em_result=4.10|2.43|4.46|500038168.000000|987|3|0.000000|4096.000000|40775544.000000|10464849920.000000|41094647808.000000|21731692544.000000

The result is pipe seperated and consists of the columns from above. The last two are used “usedSwapRaw” and “freeSwapRaw”.

Please note that you will get different results depending on the value of the environment variable “NMUPM_USE_PSEUDO_MEM”.

1
2
3
4
5
6
bash$ NMUPM_USE_PSEUDO_MEM=FALSE nmupm osLoad
ncpus=8
em_result=0.86|1.84|3.89|501596312.000000|986|3|0.000000|4096.000000|40775544.000000|10436485120.000000|21024821248.000000|47362048.000000
bash$ NMUPM_USE_PSEUDO_MEM=TRUE nmupm osLoad
ncpus=8
em_result=0.90|1.83|3.88|501596312.000000|989|3|0.000000|4096.000000|40775544.000000|10432708608.000000|41164206080.000000|21662134272.000000

Summary

If you are getting warning/critical alerts about Swap Utilization, check with glance or “swapinfo -t” to verify if pseudo-swap is used. If it is, set the property and it will fix the metric error.



Grid Control 11g: Diagnosing OMS High CPU Utilization

Beginning with 11g Oracle Enterprise Manager Grid Control is using Weblogic instead of Oracle Application Server. I am currently experiencing very sluggish Grid Control Console performance and high CPU Utilization of EMGC_OMS1 weblogic server. This article shows how you can diagnose the problem and “assist” Oracle support in helping to solve the issue.

First step to find out what´s going on is to use “top”.

1
2
3
4
5
6
7
8
top - 08:29:12 up 4 days, 20:56,  3 users,  load average: 1.47, 1.60, 1.61
Tasks: 532 total,   1 running, 531 sleeping,   0 stopped,   0 zombie
Cpu(s):  1.3%us,  0.7%sy,  0.0%ni, 97.8%id,  0.2%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  132033652k total, 20255244k used, 111778408k free,   728860k buffers
Swap: 20971512k total,        0k used, 20971512k free, 15391704k cached
 
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
30181 gc11g    15   0 1778m 1.1g  49m S 99.7  0.9 626:54.30 java

PID 30181 is using 100% (and has been using it for a very long time)

It´s the EMGC_OMS1 server:

1
2
gc11g   30181 30129  9 Nov06 ?        10:27:20 /oracle/jdk/jdk1.6.0_21/bin/java -client -Xms256m -Xmx512m -XX:CompileThreshold=8000 -XX:PermSize=128m -XX:MaxPermSize=512m -Dweblogic.Name=EMGC_OMS1 -Djava.security.policy=/oracle/grid/middleware/wlserver_10.3/server/lib/weblogic.policy -Dweblogic.system.BootIdentityFile=/oracle/grid/gc_inst/user_projects/domains/GCDomain/servers/EMGC_OMS1/data/nodemanager/boot.properties -Dweblogic.nodemanager.ServiceEnabled=true -Dweblogic.security.SSL.ignoreHostnameVerification=true -Dweblogic.ReverseDNSAllowed=false -DINSTANCE_HOME=/oracle/grid/gc_inst/em/EMGC_OMS1 -DORACLE_HOME=/oracle/grid/middleware/oms11g -Ddomain.home=/oracle/grid/gc_inst/user_projects/domains/GCDomain -Djava.awt.headless=true -Ddomain.name=GCDomain -Docm.repeater.home=/oracle/grid/middleware/oms11g -Djava.security.egd=file:///dev/urandom -Xverify:none -da -Dplatform.home=/oracle/grid/middleware/wlserver_10.3 -Dwls.home=/oracle/grid/middleware/wlserver_10.3/server -Dweblogic.home=/oracle/grid/middleware/wlserver_10.3/server -Ddomain.home=/oracle/grid/gc_inst/user_projects/domains/GCDomain -Dcommon.components.home=/oracle/grid/middleware/oracle_common -Djrf.version=11.1.1 -Dorg.apache.commons.logging.Log=org.apache.commons.logging.impl.Jdk14Logger -Djrockit.optfile=/oracle/grid/middleware/oracle_common/modules/oracle.jrf_11.1.1/jrocket_optfile.txt -Doracle.domain.config.dir=/oracle/grid/gc_inst/user_projects/domains/GCDomain/config/fmwconfig -Doracle.server.config.dir=/oracle/grid/gc_inst/user_projects/domains/GCDomain/config/fmwconfig/servers/EMGC_OMS1 -Doracle.security.jps.config=/oracle/grid/gc_inst/user_projects/domains/GCDomain/config/fmwconfig/jps-config.xml -Djava.protocol.handler.pkgs=oracle.mds.net.protocol -Digf.arisidbeans.carmlloc=/oracle/grid/gc_inst/user_projects/domains/GCDomain/config/fmwconfig/carml -Digf.arisidstack.home=/oracle/grid/gc_inst/user_projects/domains/GCDomain/config/fmwconfig/arisidprovider -Dweblogic.alternateTypesDirectory=/oracle/grid/middleware/oracle_common/modules/oracle.ossoiap_11.1.1,/oracle/grid/middleware/oracle_common/modules/oracle.oamprovider_11.1.1 -Dweblogic.jdbc.remoteEnabled=false -Dweblogic.management.discover=false -Dweblogic.management.server=https://t44grid.wwk-group.com:7101 -Dwlw.iterativeDev= -Dwlw.testConsole= -Dwlw.logErrorsToConsole= -Dweblogic.ext.dirs=/oracle/grid/middleware/patch_wls1032/profiles/default/sysext_manifest_classpath weblogic.Server
oel5n1:/home/gc11g >

Next step is to find out which thread is consuming 100% of cpu.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
oel5n1:/home/gc11g >  ps -Leo pid,ruser,vsz,rss,state,priority,nice,time,%cpu,comm,lwp,psr | grep 30181
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            30181   5
30181 gc11g   1821640 1167200 S 15 0 00:00:03  0.0 java            30182   8
30181 gc11g   1821640 1167200 S 15 0 00:01:02  0.0 java            30183   0
30181 gc11g   1821640 1167200 S 15 0 00:01:02  0.0 java            30184   8
30181 gc11g   1821640 1167200 S 15 0 00:01:02  0.0 java            30185  11
30181 gc11g   1821640 1167200 S 15 0 00:01:02  0.0 java            30186  10
30181 gc11g   1821640 1167200 S 15 0 00:01:02  0.0 java            30187   3
30181 gc11g   1821640 1167200 S 15 0 00:01:02  0.0 java            30188  15
30181 gc11g   1821640 1167200 S 15 0 00:01:02  0.0 java            30189   9
30181 gc11g   1821640 1167200 S 15 0 00:01:02  0.0 java            30190   2
30181 gc11g   1821640 1167200 S 15 0 00:01:02  0.0 java            30191  13
30181 gc11g   1821640 1167200 S 15 0 00:01:02  0.0 java            30192   1
30181 gc11g   1821640 1167200 S 15 0 00:01:02  0.0 java            30193   4
30181 gc11g   1821640 1167200 S 15 0 00:01:02  0.0 java            30194  14
30181 gc11g   1821640 1167200 S 15 0 00:01:02  0.0 java            30195   5
30181 gc11g   1821640 1167200 R 25 0 08:30:49  7.6 java            30196   2       <<<<<<
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            30197  10
30181 gc11g   1821640 1167200 S 15 0 00:00:04  0.0 java            30198   4
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            30199   5
30181 gc11g   1821640 1167200 S 15 0 00:00:58  0.0 java            30200  15
30181 gc11g   1821640 1167200 S 15 0 00:00:57  0.0 java            30201   2
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            30202   5
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            30203  13
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            30204  10
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            30496   4
30181 gc11g   1821640 1167200 S 15 0 00:18:50  0.2 java            30497  11
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            30498  15
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            30499   0
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            30504   8
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            30505  10
30181 gc11g   1821640 1167200 S 15 0 00:00:11  0.0 java            30510  15
30181 gc11g   1821640 1167200 S 15 0 00:00:38  0.0 java            30513  11
30181 gc11g   1821640 1167200 S 15 0 00:00:38  0.0 java            30514  13
30181 gc11g   1821640 1167200 S 15 0 00:00:37  0.0 java            30517  10
30181 gc11g   1821640 1167200 S 15 0 00:00:39  0.0 java            30518  14
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            30529   0
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            30533   8
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            30588   8
30181 gc11g   1821640 1167200 S 15 0 00:00:01  0.0 java            30589   5
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            30590  14
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            30591   4
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            30592   3
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            30899   8
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            30901   9
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            30904   3
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            30907  15
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            30910  12
30181 gc11g   1821640 1167200 S 15 0 00:00:02  0.0 java            31205   2
30181 gc11g   1821640 1167200 S 15 0 00:29:45  0.4 java            31526   4
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            31568  15
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            31593   5
30181 gc11g   1821640 1167200 S 15 0 00:12:02  0.1 java            31600  15
30181 gc11g   1821640 1167200 S 15 0 00:08:27  0.1 java            31601  13
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            32157   4
30181 gc11g   1821640 1167200 S 15 0 00:00:01  0.0 java            32160   2
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            32161  12
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            32163   2
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            32166  11
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            32168   0
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            32169  13
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            32170  10
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            32171  10
30181 gc11g   1821640 1167200 S 16 0 00:00:00  0.0 java            32172   3
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            32173  10
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            32174   8
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            32175  13
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            32176  12
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            32177   0
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            32178   9
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            32179  13
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            32180   9
30181 gc11g   1821640 1167200 S 15 0 00:00:01  0.0 java            32181   6
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            32182   0
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            32183   8
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            32264  15
30181 gc11g   1821640 1167200 S 15 0 00:01:02  0.0 java            32265   8
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            32269   8
30181 gc11g   1821640 1167200 S 15 0 00:00:12  0.0 java            32270   1
30181 gc11g   1821640 1167200 S 15 0 00:00:14  0.0 java            32271   6
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            32275  10
30181 gc11g   1821640 1167200 S 15 0 00:00:15  0.0 java            32280  15
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            32283  12
30181 gc11g   1821640 1167200 S 15 0 00:10:45  0.1 java            32286  13
30181 gc11g   1821640 1167200 S 15 0 00:00:47  0.0 java            32289  12
30181 gc11g   1821640 1167200 S 15 0 00:02:57  0.0 java            32295   0
30181 gc11g   1821640 1167200 S 15 0 00:00:03  0.0 java            32297  12
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            32301   9
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            32302  12
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            32305   5
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            32306  12
30181 gc11g   1821640 1167200 S 15 0 00:01:09  0.0 java            32307   7
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            32308   2
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            32309   0
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            32310   4
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            32311   8
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            32326  13
30181 gc11g   1821640 1167200 S 15 0 00:00:14  0.0 java            32327  11
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            32336  13
30181 gc11g   1821640 1167200 S 15 0 00:00:01  0.0 java            32337   8
30181 gc11g   1821640 1167200 S 15 0 00:00:29  0.0 java            32338   3
30181 gc11g   1821640 1167200 S 15 0 00:00:05  0.0 java            32339   9
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            32340  10
30181 gc11g   1821640 1167200 S 15 0 00:00:18  0.0 java            32341  12
30181 gc11g   1821640 1167200 S 15 0 00:00:02  0.0 java            32342   7
30181 gc11g   1821640 1167200 S 15 0 00:00:29  0.0 java            32343   7
30181 gc11g   1821640 1167200 S 15 0 00:00:10  0.0 java            32344   0
30181 gc11g   1821640 1167200 S 15 0 00:00:05  0.0 java            32351   0
30181 gc11g   1821640 1167200 S 15 0 00:00:02  0.0 java             1943   9
30181 gc11g   1821640 1167200 S 15 0 00:01:19  0.0 java             1944   6
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java             1945  15
30181 gc11g   1821640 1167200 S 15 0 00:00:06  0.0 java             1946  11
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java             2890  15
30181 gc11g   1821640 1167200 S 15 0 00:02:56  0.0 java             7839  14
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            10208   0
30181 gc11g   1821640 1167200 S 15 0 00:00:02  0.0 java            16818  15
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            18310   2
30181 gc11g   1821640 1167200 S 15 0 00:00:09  0.0 java              642  14
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java              690   7
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            27517   8
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            27518   8
30181 gc11g   1821640 1167200 S 15 0 00:00:01  0.0 java            29228   3
30181 gc11g   1821640 1167200 S 15 0 00:02:46  0.1 java            12075  12
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            12160   1
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            12829  13
30181 gc11g   1821640 1167200 S 15 0 00:00:02  0.0 java             2773   7
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            14748  14
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            14749   6
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            14750   1
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            14751   0
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            14752   5
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            14753   5
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            14860  12
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            16706   8
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            17312   1
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            17313  12
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            17425  12
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            17426  15
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            19993  12
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            19994  15
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            19995  12
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java             3575   9
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            16593  14
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            18488   8
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            18489   5
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            21901   7
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java             5572   7
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java             7993   9
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            27543  13
30181 gc11g   1821640 1167200 S 15 0 00:00:00  0.0 java            32075  15

So, it is thread 30196 of PID 30181. Let´s remove the ” sleeping ” threads with grep.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
oel5n1:/home/gc11g >  ps -Leo pid,ruser,vsz,rss,state,priority,nice,time,%cpu,comm,lwp,psr | grep 30181 | grep -v " S "
30181 gc11g   1821640 1167200 R 25 0 08:31:37  7.7 java            30196   6
oel5n1:/home/gc11g >  ps -Leo pid,ruser,vsz,rss,state,priority,nice,time,%cpu,comm,lwp,psr | grep 30181 | grep -v " S "
30181 gc11g   1821640 1167200 R 25 0 08:31:39  7.7 java            30196   2
oel5n1:/home/gc11g >  ps -Leo pid,ruser,vsz,rss,state,priority,nice,time,%cpu,comm,lwp,psr | grep 30181 | grep -v " S "
30181 gc11g   1821640 1167200 R 25 0 08:31:40  7.7 java            30196   2
oel5n1:/home/gc11g >  ps -Leo pid,ruser,vsz,rss,state,priority,nice,time,%cpu,comm,lwp,psr | grep 30181 | grep -v " S "
30181 gc11g   1821640 1167200 R 18 0 08:31:40  7.7 java            30196   2
oel5n1:/home/gc11g >  ps -Leo pid,ruser,vsz,rss,state,priority,nice,time,%cpu,comm,lwp,psr | grep 30181 | grep -v " S "
30181 gc11g   1821640 1167200 R 18 0 08:31:41  7.7 java            30196  15
oel5n1:/home/gc11g >  ps -Leo pid,ruser,vsz,rss,state,priority,nice,time,%cpu,comm,lwp,psr | grep 30181 | grep -v " S "
30181 gc11g   1821640 1167200 R 19 0 08:31:41  7.7 java            30196  15
oel5n1:/home/gc11g >  ps -Leo pid,ruser,vsz,rss,state,priority,nice,time,%cpu,comm,lwp,psr | grep 30181 | grep -v " S "
30181 gc11g   1821640 1167200 R 20 0 08:31:42  7.7 java            30196  13
oel5n1:/home/gc11g >  ps -Leo pid,ruser,vsz,rss,state,priority,nice,time,%cpu,comm,lwp,psr | grep 30181 | grep -v " S "
30181 gc11g   1821640 1167200 R 20 0 08:31:42  7.7 java            30196   9
oel5n1:/home/gc11g >  ps -Leo pid,ruser,vsz,rss,state,priority,nice,time,%cpu,comm,lwp,psr | grep 30181 | grep -v " S "
30181 gc11g   1821640 1167200 R 21 0 08:31:42  7.7 java            30196   9
oel5n1:/home/gc11g >  ps -Leo pid,ruser,vsz,rss,state,priority,nice,time,%cpu,comm,lwp,psr | grep 30181 | grep -v " S "
30181 gc11g   1821640 1167200 R 21 0 08:31:43  7.7 java            30196   9
oel5n1:/home/gc11g >  ps -Leo pid,ruser,vsz,rss,state,priority,nice,time,%cpu,comm,lwp,psr | grep 30181 | grep -v " S "
30181 gc11g   1821640 1167200 R 23 0 08:31:43  7.7 java            30196   9
oel5n1:/home/gc11g >  ps -Leo pid,ruser,vsz,rss,state,priority,nice,time,%cpu,comm,lwp,psr | grep 30181 | grep -v " S "
30181 gc11g   1821640 1167200 R 24 0 08:31:43  7.7 java            30196   9

It is always the same thread which is using 7.7% (~1/16 cores) of the all the CPU´s on the system.

let´s take a pstack:

1
oel5n1:/home/gc11g > pstack 30181

let´s grep for the LWP we know:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
oel5n1:/home/gc11g > pstack 30181 | grep -A 20 30196
 
Thread 135 (Thread 0x4024f940 (LWP 30196)):                           <<<<<<<<<<<<<<<<<<
#0  0x00002b7471e17d3f in instanceKlass::oop_follow_contents ()
#1  0x00002b747203e6c3 in objArrayKlass::oop_follow_contents ()
#2  0x00002b7471fee5e3 in MarkSweep::follow_stack ()
#3  0x00002b7472094964 in PSMarkSweep::mark_sweep_phase1 ()
#4  0x00002b7472093859 in PSMarkSweep::invoke_no_policy ()
#5  0x00002b74720a0737 in PSScavenge::invoke ()
#6  0x00002b747206814e in ParallelScavengeHeap::failed_mem_allocate ()
#7  0x00002b747218fbc9 in VM_ParallelGCFailedAllocation::doit ()
#8  0x00002b747219ccfa in VM_Operation::evaluate ()
#9  0x00002b747219c312 in VMThread::evaluate_operation ()
#10 0x00002b747219c583 in VMThread::loop ()
#11 0x00002b747219c08e in VMThread::run ()
#12 0x00002b74720560df in java_start ()
#13 0x0000003130e064a7 in start_thread () from /lib64/libpthread.so.0
#14 0x00000031302d3c2d in clone () from /lib64/libc.so.6
 
 
oel5n1:/home/gc11g > pstack 30181 | grep -A 20 30196
Thread 135 (Thread 0x4024f940 (LWP 30196)):
#0  0x00002b7471e17d3f in instanceKlass::oop_follow_contents ()
#1  0x00002b747203e6c3 in objArrayKlass::oop_follow_contents ()
#2  0x00002b7471fee5e3 in MarkSweep::follow_stack ()
#3  0x00002b7472094964 in PSMarkSweep::mark_sweep_phase1 ()
#4  0x00002b7472093859 in PSMarkSweep::invoke_no_policy ()
#5  0x00002b74720a0737 in PSScavenge::invoke ()
#6  0x00002b747206814e in ParallelScavengeHeap::failed_mem_allocate ()
#7  0x00002b747218fbc9 in VM_ParallelGCFailedAllocation::doit ()
#8  0x00002b747219ccfa in VM_Operation::evaluate ()
#9  0x00002b747219c312 in VMThread::evaluate_operation ()
#10 0x00002b747219c583 in VMThread::loop ()
#11 0x00002b747219c08e in VMThread::run ()
#12 0x00002b74720560df in java_start ()
#13 0x0000003130e064a7 in start_thread () from /lib64/libpthread.so.0
#14 0x00000031302d3c2d in clone () from /lib64/libc.so.6
 
 
 
oel5n1:/home/gc11g > pstack 30181 | grep -A 20 30196
Thread 135 (Thread 0x4024f940 (LWP 30196)):
#0  0x00002b7472046d27 in objArrayKlass::oop_adjust_pointers ()
#1  0x00002b747209549e in PSMarkSweepDecorator::adjust_pointers ()
#2  0x00002b74720967f0 in PSOldGen::adjust_pointers ()
#3  0x00002b7472094c3a in PSMarkSweep::mark_sweep_phase3 ()
#4  0x00002b74720938f6 in PSMarkSweep::invoke_no_policy ()
#5  0x00002b74720a0737 in PSScavenge::invoke ()
#6  0x00002b747206814e in ParallelScavengeHeap::failed_mem_allocate ()
#7  0x00002b747218fbc9 in VM_ParallelGCFailedAllocation::doit ()
#8  0x00002b747219ccfa in VM_Operation::evaluate ()
#9  0x00002b747219c312 in VMThread::evaluate_operation ()
#10 0x00002b747219c583 in VMThread::loop ()
#11 0x00002b747219c08e in VMThread::run ()
#12 0x00002b74720560df in java_start ()
#13 0x0000003130e064a7 in start_thread () from /lib64/libpthread.so.0
#14 0x00000031302d3c2d in clone () from /lib64/libc.so.6
 
 
oel5n1:/home/gc11g > pstack 30181 | grep -A 20 30196
Thread 135 (Thread 0x4024f940 (LWP 30196)):
#0  0x00002b7471fee9d3 in MarkSweep::IsAliveClosure::do_object_b ()
#1  0x00002b7471dfad18 in Hashtable::unlink ()
#2  0x00002b74720949ef in PSMarkSweep::mark_sweep_phase1 ()
#3  0x00002b7472093859 in PSMarkSweep::invoke_no_policy ()
#4  0x00002b74720a0737 in PSScavenge::invoke ()
#5  0x00002b747206814e in ParallelScavengeHeap::failed_mem_allocate ()
#6  0x00002b747218fbc9 in VM_ParallelGCFailedAllocation::doit ()
#7  0x00002b747219ccfa in VM_Operation::evaluate ()
#8  0x00002b747219c312 in VMThread::evaluate_operation ()
#9  0x00002b747219c583 in VMThread::loop ()
#10 0x00002b747219c08e in VMThread::run ()
#11 0x00002b74720560df in java_start ()
#12 0x0000003130e064a7 in start_thread () from /lib64/libpthread.so.0
#13 0x00000031302d3c2d in clone () from /lib64/libc.so.6
 
 
oel5n1:/home/gc11g > pstack 30181 | grep -A 20 30196
Thread 135 (Thread 0x4024f940 (LWP 30196)):
#0  0x00002b7471fee706 in MarkSweep::AdjustPointerClosure::do_oop ()
#1  0x00002b7471dfadb1 in Hashtable::oops_do ()
#2  0x00002b7472094bf9 in PSMarkSweep::mark_sweep_phase3 ()
#3  0x00002b74720938f6 in PSMarkSweep::invoke_no_policy ()
#4  0x00002b74720a0737 in PSScavenge::invoke ()
#5  0x00002b747206814e in ParallelScavengeHeap::failed_mem_allocate ()
#6  0x00002b747218fbc9 in VM_ParallelGCFailedAllocation::doit ()
#7  0x00002b747219ccfa in VM_Operation::evaluate ()
#8  0x00002b747219c312 in VMThread::evaluate_operation ()
#9  0x00002b747219c583 in VMThread::loop ()
#10 0x00002b747219c08e in VMThread::run ()
#11 0x00002b74720560df in java_start ()
#12 0x0000003130e064a7 in start_thread () from /lib64/libpthread.so.0
#13 0x00000031302d3c2d in clone () from /lib64/libc.so.6
 
 
oel5n1:/home/gc11g > pstack 30181 | grep -A 20 30196
Thread 135 (Thread 0x4024f940 (LWP 30196)):
#0  0x00002b7471e179e5 in instanceKlass::adjust_static_fields ()
#1  0x00002b7471e234ad in instanceKlassKlass::oop_adjust_pointers ()
#2  0x00002b747209549e in PSMarkSweepDecorator::adjust_pointers ()
#3  0x00002b74720967f0 in PSOldGen::adjust_pointers ()
#4  0x00002b7472094c3a in PSMarkSweep::mark_sweep_phase3 ()
#5  0x00002b74720938f6 in PSMarkSweep::invoke_no_policy ()
#6  0x00002b74720a0737 in PSScavenge::invoke ()
#7  0x00002b747206814e in ParallelScavengeHeap::failed_mem_allocate ()
#8  0x00002b747218fbc9 in VM_ParallelGCFailedAllocation::doit ()
#9  0x00002b747219ccfa in VM_Operation::evaluate ()
#10 0x00002b747219c312 in VMThread::evaluate_operation ()
#11 0x00002b747219c583 in VMThread::loop ()
#12 0x00002b747219c08e in VMThread::run ()
#13 0x00002b74720560df in java_start ()
#14 0x0000003130e064a7 in start_thread () from /lib64/libpthread.so.0
#15 0x00000031302d3c2d in clone () from /lib64/libc.so.6
 
 
oel5n1:/home/gc11g > pstack 30181 | grep -A 20 30196
Thread 135 (Thread 0x4024f940 (LWP 30196)):
#0  0x00002b7471e17e54 in instanceKlass::oop_follow_contents ()
#1  0x00002b747203e6c3 in objArrayKlass::oop_follow_contents ()
#2  0x00002b7471fee5e3 in MarkSweep::follow_stack ()
#3  0x00002b7472094964 in PSMarkSweep::mark_sweep_phase1 ()
#4  0x00002b7472093859 in PSMarkSweep::invoke_no_policy ()
#5  0x00002b74720a0737 in PSScavenge::invoke ()
#6  0x00002b747206814e in ParallelScavengeHeap::failed_mem_allocate ()
#7  0x00002b747218fbc9 in VM_ParallelGCFailedAllocation::doit ()
#8  0x00002b747219ccfa in VM_Operation::evaluate ()
#9  0x00002b747219c312 in VMThread::evaluate_operation ()
#10 0x00002b747219c583 in VMThread::loop ()
#11 0x00002b747219c08e in VMThread::run ()
#12 0x00002b74720560df in java_start ()
#13 0x0000003130e064a7 in start_thread () from /lib64/libpthread.so.0
#14 0x00000031302d3c2d in clone () from /lib64/libc.so.6
 
 
oel5n1:/home/gc11g > pstack 30181 | grep -A 20 30196
Thread 135 (Thread 0x4024f940 (LWP 30196)):
#0  0x00002b747203e640 in objArrayKlass::oop_follow_contents ()
#1  0x00002b7471fee5e3 in MarkSweep::follow_stack ()
#2  0x00002b7472094964 in PSMarkSweep::mark_sweep_phase1 ()
#3  0x00002b7472093859 in PSMarkSweep::invoke_no_policy ()
#4  0x00002b74720a0737 in PSScavenge::invoke ()
#5  0x00002b747206814e in ParallelScavengeHeap::failed_mem_allocate ()
#6  0x00002b747218fbc9 in VM_ParallelGCFailedAllocation::doit ()
#7  0x00002b747219ccfa in VM_Operation::evaluate ()
#8  0x00002b747219c312 in VMThread::evaluate_operation ()
#9  0x00002b747219c583 in VMThread::loop ()
#10 0x00002b747219c08e in VMThread::run ()
#11 0x00002b74720560df in java_start ()
#12 0x0000003130e064a7 in start_thread () from /lib64/libpthread.so.0
#13 0x00000031302d3c2d in clone () from /lib64/libc.so.6
 
 
 
oel5n1:/home/gc11g > pstack 30181 | grep -A 20 30196
Thread 135 (Thread 0x4024f940 (LWP 30196)):
#0  0x00002b7471d50445 in ConstantPoolCacheEntry::follow_contents ()
#1  0x00002b7471d4f4f6 in constantPoolCacheKlass::oop_follow_contents ()
#2  0x00002b7471fee5e3 in MarkSweep::follow_stack ()
#3  0x00002b7472094964 in PSMarkSweep::mark_sweep_phase1 ()
#4  0x00002b7472093859 in PSMarkSweep::invoke_no_policy ()
#5  0x00002b74720a0737 in PSScavenge::invoke ()
#6  0x00002b747206814e in ParallelScavengeHeap::failed_mem_allocate ()
#7  0x00002b747218fbc9 in VM_ParallelGCFailedAllocation::doit ()
#8  0x00002b747219ccfa in VM_Operation::evaluate ()
#9  0x00002b747219c312 in VMThread::evaluate_operation ()
#10 0x00002b747219c583 in VMThread::loop ()
#11 0x00002b747219c08e in VMThread::run ()
#12 0x00002b74720560df in java_start ()
#13 0x0000003130e064a7 in start_thread () from /lib64/libpthread.so.0
#14 0x00000031302d3c2d in clone () from /lib64/libc.so.6
 
oel5n1:/home/gc11g > pstack 30181 | grep -A 20 30196
Thread 135 (Thread 0x4024f940 (LWP 30196)):
#0  0x00002b7471f8af28 in klassKlass::oop_follow_contents ()
#1  0x00002b7471e226cc in instanceKlassKlass::oop_follow_contents ()
#2  0x00002b7471fee5e3 in MarkSweep::follow_stack ()
#3  0x00002b7472094964 in PSMarkSweep::mark_sweep_phase1 ()
#4  0x00002b7472093859 in PSMarkSweep::invoke_no_policy ()
#5  0x00002b74720a0737 in PSScavenge::invoke ()
#6  0x00002b747206814e in ParallelScavengeHeap::failed_mem_allocate ()
#7  0x00002b747218fbc9 in VM_ParallelGCFailedAllocation::doit ()
#8  0x00002b747219ccfa in VM_Operation::evaluate ()
#9  0x00002b747219c312 in VMThread::evaluate_operation ()
#10 0x00002b747219c583 in VMThread::loop ()
#11 0x00002b747219c08e in VMThread::run ()
#12 0x00002b74720560df in java_start ()
#13 0x0000003130e064a7 in start_thread () from /lib64/libpthread.so.0
#14 0x00000031302d3c2d in clone () from /lib64/libc.so.6

I´m not much of a Java guy, but this tells me that the thread is doing ParallelGC. So, the issue might be related to a configuration of too small Heap size. (Xmx) I will talk with support to decide whether they support to have Xmx increased to 1G or 2G. In case the cpu consumption is not for Garbage Collection but some other OEM code, then there is a command “emctl dump omsthread”, which can be used:

1
2
3
4
5
oel5n1:/home/gc11g > emctl dump omsthread
Oracle Enterprise Manager 11g Release 1 Grid Control
Copyright (c) 1996, 2010 Oracle Corporation.  All rights reserved.
Thread dumped successfully for process '30181' to log file 
/oracle/grid/gc_inst/user_projects/domains/GCDomain/servers/EMGC_OMS1/logs/EMGC_OMS1.out

If we calculate LWP 30196 from dec to hex, we get 75f5, this is NID from OMS thread dump.

1
2
3
4
"Reference Handler" daemon prio=10 tid=0x000000004fdff000 nid=0x75f5 waiting for monitor entry [0x0000000040725000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:108)
        - waiting to lock <0x00002aaacf4967a0> (a java.lang.ref.Reference$Lock)

Update 26.11.2010:

Oracle support has recommended to increase Xms/Xmx to 1024m and check whether that is sufficient. This has to be changed in file gc_inst/user_projects/domains/GCDomain/bin/setDomainEnv.sh and in those variables for Linux 64bit:

XMS_SUN_64BIT="1024"
XMX_SUN_64BIT="1024"