Blocking Session not visible in v$ Views

I had to troubleshoot an issue on a company ERP systems database, where several sessions were waiting for ‘enq: TX row lock contention’. It was a table containing print server nodes and this caused no more printouts. Usually troubleshooting locking issues is quite straightforward in recent versions of oracle database, but this time, the column “v$session.blocking_session” was not populated and there was no entry in v$lock for the blocker. This was strange, so I involved Oracle Support. They requested a system state dump.

SQL> oradebug setmypid
SQL> oradebug unlimit;
SQL> select * from dual;
SQL> oradebug dump systemstate 258
SQL> oradebug dump systemstate 258

In the system state dump, there was a reference for the hanging session:

Waiter:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
  ----------------------------------------
  SO: 0x000000075C68F8B8, type: 2, owner: 0x0000000000000000, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
   proc=0x000000075C68F8B8, name=process, file=ksu.h LINE:12740, pg=0
  (process) Oracle pid:161, ser:247, calls cur/top: 0x000000075BC6A370/0x000000075BC6A370
            flags : (0x0) -
            flags2: (0x0),  flags3: (0x10) 
            intr error: 0, call error: 0, sess error: 0, txn error 0
            intr queue: empty
    ksudlp FALSE at location: 0
  (post info) last post received: 0 0 0
              last post received-location: No post
              last process to post me: none
              last post sent: 0 0 0
              last post sent-location: No post
              last process posted by me: none
    (latch info) wait_event=0 bits=0x0
    Process Group: DEFAULT, pseudo proc: 0x000000076068CE68
    O/S info: user: SYSTEM, term: HOSTNAME, ospid: 4600 
    OSD pid info: Windows thread id: 4600, image: ORACLE.EXE (SHAD)
    Short stack dump: 
ksedsts()+585<-ksdxfstk()+44<-ksdxcb()+2402<-ssthreadsrgruncallback()+629<-OracleOradebugThreadStart()+829
<-0000000076D559CD<-0000000076E8A561<-0000000076EABD7A<-000007FEFCE910AC<-skgpwwait()+182<-ksliwat()+2459
<-kslwaitctx()+184<-ksqcmi()+7848<-ksqgtlctx()+6814<-ksqgelctx()+829<-ktuGetTxForXid()+190<-ktcwit1()+526
<-kdddgb()+9636<-kddlkr()+284<-qerfuProcessFetchedRow()+1019<-__PGOSF308_qerfuLockingRowProcedure()+89
<-qertbFetchByRowID()+2708<-qerfuStart()+590<-selexe0()+3495<-opiexe()+22377<-kpoal8()+2624<-opiodr()+1646
<-ttcpip()+1481<-opitsk()+2166<-opiino()+1246<-opiodr()+1646<-opidrv()+862<-sou2o()+98<-opimai_real()+158
<-opimai()+191<-OracleThreadStart()+724<-0000000076D559CD<-0000000076E8A561
01-04-2018 21:16:06.636
    ----------------------------------------
    SO: 0x00000007606D9B88, type: 4, owner: 0x000000075C68F8B8, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
     proc=0x000000075C68F8B8, name=session, file=ksu.h LINE:12748, pg=0
    (session) sid: 68 ser: 20263 trans: 0x0000000755DB9838, creator: 0x000000075C68F8B8
              flags: (0x41) USR/- flags_idl: (0x1) BSY/-/-/-/-/-
              flags2: (0x40009) -/-/INC
              DID: , short-term DID: 
              txn branch: 0x0000000000000000
              edition#: 100              oct: 3, prv: 0, sql: 0x00000007708AA2E8, psql: 0x000000076BDCDED8, user: 0/SYS
    ksuxds FALSE at location: 0
    service name: SYS$USERS
    client details:
      O/S info: user: username, term: hostname, ospid: 5968:4960
      machine: hostname program: sqlplus.exe
      application name: sqlplus.exe, hash value=254292535
    Current Wait Stack:
     0: waiting for 'enq: TX - row lock contention'
        name|mode=0x54580006, usn<<16 | slot=0x130009, sequence=0xb394f
        wait_id=10 seq_num=11 snap_id=1
        wait times: snap=8 min 15 sec, exc=8 min 15 sec, total=8 min 15 sec
        wait times: max=infinite, heur=8 min 15 sec
        wait counts: calls=166 os=166
        in_wait=1 iflags=0x15a0
    Wait State:
      fixed_waits=0 flags=0x22 boundary=0x0000000000000000/-1
...
      ----------------------------------------
      SO: 0x0000000778A7FCA8, type: 8, owner: 0x000000075BC6A370, flag: INIT/-/-/0x00 if: 0x1 c: 0x1
       proc=0x000000075C68F8B8, name=enqueue, file=ksq1.h LINE:380, pg=0
      (enqueue) TX-00130009-000B394F	DID: 0001-00A1-0003F59C
      lv: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  res_flag: 0x6
      req: X, lock_flag: 0x10, lock: 0x0000000778A7FD00, res: 0x0000000760A2BD80
      own: 0x00000007606D9B88, sess: 0x00000007606D9B88, proc: 0x000000075C68F8B8, prv: 0x0000000760A2BDA0
    ----------------------------------------

In line 60 you can see that the session is requesting an eXclusive lock for resource res: 0x0000000760A2BD80. When searching for the next appearance of res: 0x0000000760A2BD80, we find a reference to the exclusive enqueue after the last process in a section PSEUDO PROCESS / DETACHED BRANCHES.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
PSEUDO PROCESS for group DEFAULT:
  ----------------------------------------
  SO: 0x000000076068CE68, type: 2, owner: 0x0000000000000000, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
   proc=0x000000076068CE68, name=process, file=ksu.h LINE:12740, pg=0
  (process) Oracle pid:0, ser:0, calls cur/top: 0x0000000000000000/0x0000000000000000
            flags : (0x20) PSEUDO
            flags2: (0x0),  flags3: (0x0) 
            intr error: 0, call error: 0, sess error: 0, txn error 0
            intr queue: empty
    ksudlp FALSE at location: 0
  (post info) last post received: 0 0 0
              last post received-location: No post
              last process to post me: none
              last post sent: 0 0 0
              last post sent-location: No post
              last process posted by me: none
    (latch info) wait_event=0 bits=0x0
    Process Group: DEFAULT, pseudo proc: 0x000000076068CE68
    O/S info: user: , term: , ospid:  (DEAD)
    OSD pid info: Windows thread id: 0, image: PSEUDO
  PSO child state object changes :
Dump of memory from 0x00000007605E5698 to 0x00000007605E58A0
7605E5690                   00000000 00000000          [........]
7605E56A0 00000000 00000000 00000000 00000000  [................]
  Repeat 31 times
BEGIN DUMP RESERVED PROCESSES
END DUMP RESERVED PROCESSES
DETACHED BRANCHES:
----------------------------------------
SO: 0x0000000755E7C0F0, type: 58, owner: 0x0000000000000000, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
 proc=0x0000000000000000, name=branch, file=ktccts.h LINE:420, pg=0
(branch) trn = 0x0000000000000000, flg = 0x40, state = 0x00 bno=0 ser=0 evt=0
..
  SO: 0x0000000755E2B378, type: 58, owner: 0x0000000755E7C0F0, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
   proc=0x000000077860EB80, name=branch, file=ktccts.h LINE:420, pg=0
  (branch) trn = 0x0000000755E01068, flg = 0x41090, state = 0x00 bno=1 ser=10 evt=10
   creator = 0x00000007607940E8 uid = 0x42 serial# = 7929 ttl = 0x15180 dtm = 0x5a4db9f5
   2PCrole = (nch = 0x0 flg = 0x0) DTP svc = 0x773ffc978
...
    SO: 0x0000000755E01068, type: 56, owner: 0x0000000755E2B378, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
     proc=0x000000077860EB80, name=transaction, file=ktccts.h LINE:410, pg=0
    (trans) flg = 0x00401e03, flg2 = 0x04080000, flg3 = 0x00000000, prx = 0x0000000000000000, ros = 2147483647, crtses=0x00000007607940E8
...
    (enqueue) TX-00130009-000B394F	DID: 0001-00E6-000010CD
    lv: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  res_flag: 0x6
    mode: X, lock_flag: 0x0, lock: 0x0000000755E010E0, res: 0x0000000760A2BD80
    own: 0x0000000755E01068 (TXN), sess: 0x0000000000000000, prv: 0x0000000760A2BD90

The SEV1 support engineer could not be convinced that the entry DETACHED BRANCHES has nothing to do with the last session listed in the system state dump. I delivered multiple system state dumps and he insisted that the lock was always held by the very last session of the system state dump – what a coincidence. He just scrolled up from the line 46 until the first occurence of “(session)” and was convinced to have found the blocker, not realizing that he overjumped the sections “DETACHED BRANCHES”/”PSEUDO PROCESS for group DEFAULT”.

After some research I found this note:

Solving locking problems in a XA environment ( Doc ID 1248848.1 )

This note explains that with XA (distributed transactions), e.g. DB Links, there can be situations where blocker is not visible. Unfortunately, the symptoms disappeared from the production system before the diagnostic steps from the MOS Note could be tried. Frequently, when there is an issue that remains unsolved, it comes back at a later point and so it was in this case. This time, I was armed with knowledge of the MOS note 1248848.1 and could use the XA rollback script provided there to free up the “session-less” blocking lock.

No entry in DBA_2PC_PENDING but the Query from MOS Note shows Active XA Trans:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
select --+ ORDERED
'----------------------------------------'||'
Curent Time : '|| substr(to_char(sysdate,' HH24.MI.SS'),1,22) ||'
'||'TX start_time: '||t.KTCXBSTM||'
'||'FORMATID: '||g.K2GTIFMT ||'
'||'GTXID: '||g.K2GTITID_EXT ||'
'||'Branch: '||g.K2GTIBID ||'
Local_Tran_Id ='||substr(t.KXIDUSN||'.'||t.kXIDSLT||'.'||t.kXIDSQN,1,15)||'
'||'KTUXESTA='|| x.KTUXESTA ||'
'||'KTUXEDFL='|| x.KTUXECFL ||'
Lock_Info: ID1: ' || ((t.kXIDUSN*64*1024)+ t.kXIDSLT)
||' ID2: '|| t.kXIDSQN
XA_transaction_INFO
from x$k2gte g, x$ktcxb t, x$ktuxe x
where g.K2GTDXCB =t.ktcxbxba and
x.KTUXEUSN = t.KXIDUSN(+) and
x.KTUXESLT = t.kXIDSLT(+) and
x.KTUXESQN =t.kXIDSQN(+);
 
"----------------------------------------
Curent Time : 11.35.27
TX start_time: 02/05/18 15:35:17
FORMATID: 131075
GTXID: 312D613331303063613A653165623A35613732336438373A31326335323962
Branch: 613331303063613A653165623A35613732336438373A31326335326166
Local_Tran_Id =25.18.61640
KTUXESTA=ACTIVE
KTUXEDFL=NONE
Lock_Info: ID1: 1638418 ID2: 61640"
"----------------------------------------
Curent Time : 11.35.27
TX start_time: 02/05/18 15:35:17
FORMATID: 131075
GTXID: 312D613331303063613A653165623A35613732336438373A31326335326135
Branch: 613331303063613A653165623A35613732336438373A31326335326165
Local_Tran_Id =16.26.2292874
KTUXESTA=ACTIVE
KTUXEDFL=NONE
Lock_Info: ID1: 1048602 ID2: 2292874"
"----------------------------------------
Curent Time : 11.35.27
TX start_time: 02/05/18 15:35:17
FORMATID: 131075
GTXID: 312D613331303063613A653165623A35613732336438373A31326335326161
Branch: 613331303063613A653165623A35613732336438373A31326335326232
Local_Tran_Id =3.30.3767984
KTUXESTA=ACTIVE
KTUXEDFL=NONE
Lock_Info: ID1: 196638 ID2: 3767984"

This output proved that it was CASE 2: No Entry in dba_2pc_pending und Status ACTIVE. Then I could successfully avoid an instance restart and rollback the global transaction to free up the locks.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
set JAVA_HOME=c:\oracle\product\11.2.0.4\dbhome_1\jdk\jre
set CLASSPATH=c:\dba\scripts\;c:\oracle\product\11.2.0.4\dbhome_1\jdbc\lib\ojdbc5.jar:%CLASSPATH%
C:\DBA\scripts>c:\oracle\product\11.2.0.4\dbhome_1\jdk\jre\bin\java XA_rb jdbc:oracle:thin:@hostname:1521:PROD -ROLLBACK 25 18 61640
++ URL: jdbc:oracle:thin:@hostname:1521:PROD
++ Rollback of Local_tran_ID : 25.18.61640
++ got XA resource handle
++ got Connection handle
++ SQL : select g.K2GTIFMT, g.K2GTITID_EXT, g.K2GTIBID, rawtohex( g.K2GTITID_EXT), rawtohex(g.K2GTIBID) from sys.vw_x$k2gte g, sys.vw_x$ktcxb t, sys.vw_x$ktuxe x where g.K2GTDXCB =t.KTCXBXBA and x.KTUXEUSN
SLT(+) and x.KTUXESQN =t.KXIDSQN(+) and t.KXIDUSN=25 and t.kXIDSLT=18 and t.kXIDSQN= 61640
++ Getting XID for Local_Tran_ID: 25.18.61640
+ Found global XID: -->
++ Format Id: 131075
++ Group Id: 312D613331303063613A653165623A35613732336438373A31326335323962
++ Branch Id: 613331303063613A653165623A35613732336438373A31326335326166
 
-> Would your really like to continue (yes or no):
yes
... doing Rollback
 
... Rollback done !

With Redo Log Mining for the transaction ID, we could get some minor details about the session which initially opened this global transaction.

So whenever you find the strange situation that sessions are blocked on ‘enq: TX – row lock contention’ but there is no blocker in v$session blocking_session column or no entry of a session holding the lock in v$lock, then this MOS Note might be helpful in cleaning up the lock. Sadly, another case where even SEV1 support was no help at all. I have to mention however, that i recently had a SEV1 issue with an excellent and ambitious engineer. So there is still hope…

Super-Sizing PGA Workareas in Oracle 12c

After a customer asked me for possibilities of super-sizing PGA workareas in version 12c, I took the chance to revisit the topic and perform some tests. Great material has already been posted by Alex Fatkulin (Hotsos Symposium 2014) and Norbert Debes (Secrets of the Oracle Database, Apress), but I wanted to verify, how the current release is behaving and I found some interesting things.

Super-Sizing PGA Workareas in 12c

Cloud Control – Privilege Delegation – so you don´t have the oracle / root password?

Quite frequently in database environments, security policies dictate that only personalized logons to Unix / Linux are allowed and that from there, one has to “sudo” to change to the oracle account. While this adds an additional layer of security, it makes administration a little more complicated.

Oracle Enterprise Manager – Cloud Control has a feature, which allows to cope with such a sudo environment. The feature is called “Privilege Delegation”. This post describes how to set it up and for what it can be used.

  1. Setup of “sudo” by the root account
  2. In order to use privilege delegation, specific sudo rules have to be defined. This rule normally already exists:

    1
    
    mdecker ALL=(root) NOPASSWD:/bin/su - oracle

    In addition to this one, two additional rules are required. If only sudo to “oracle” is required, then only the first line is needed.

    1
    2
    
    mdecker ALL=(oracle) SETENV:/u01/app/oracle/cloud/agent/sbin/nmosudo *
    mdecker ALL=(root) SETENV:/u01/app/oracle/cloud/agent/sbin/nmosudo *
  3. Setup of Privilege Delegation in Cloud Control
  4. Go to Setup -> Security -> Privilege Delegation.

    Then you can either set it globally via template or individually for each host. This depends mainly on the path to “sudo” binary in the operating system. Then you choose “sudo” and provide the path to the sudo binary on your operating system (bash$ which sudo). The required parameters are then appended: /usr/bin/sudo -E -u %RUNAS% %COMMAND%

  5. The next step is to configure a Named Credential.
  6. If you are in a team of administrators, then each administrator should have his own account to log on to Cloud Contol and avoid using “sysman” user. For obvious reasons, each administrator has to create his own “named credential” (Setup->Security->Named Credential), because it contains his personalized username and password.

    Here you provide your personalized credentials (username/password) and specify that sudo should be used to change to “oracle”.

  7. Lastly, verify that it works as desired.
  8. Go to “Targets” -> “Host” and click “Run Host Command”. Then give the command to run, e.g. “id -a”, and then add a named credential as well as a specific host and click “Run”.

    If all is well, then you will get this output showing you the id of user oracle.

How does this work behind the scenes? The agent java process spawns a process “nmo”. This process was granted SETUID root Privileges by executing $ORACLE_HOME/root.sh at the time of agent deployment. This executable is calling “sudo” to run command “nmosudo” as user oracle with the “payload” command, which the user wanted to execute.

1
2
3
root   14595  3149  0 14:23 pts/0    00:00:00 /u01/app/oracle/cloud/agent/sbin/nmo
root   14598 14595  0 14:23 pts/1    00:00:00   /usr/bin/sudo -p ###AGENT-PDP-PASSWORD-PROMPT### -E -u oracle /u01/app/oracle/cloud/agent/sbin/nmosudo DEFAULT_PLUGIN DEFAULT_FUNCTIONALITY DEFAULT_SUBACTION DEFAULT_ACTION /bin/sh -c id -a
oracle 14602 14598  0 14:23 pts/1    00:00:00     sleep 300

The linux logfile /var/log/secure will contain this messages. It can be seen that the personal user “mdecker” was running the command “nmosudo” with the payload command “id -a” as attribute.

1
Sep 13 22:03:59 xxx sudo:  mdecker : TTY=pts/2 ; PWD=/u01/app/oracle/cloud/agent/agent_inst/sysman/emd ; USER=oracle ; COMMAND=/u01/app/oracle/cloud/agent/sbin/nmosudo DEFAULT_PLUGIN DEFAULT_FUNCTIONALITY DEFAULT_SUBACTION DEFAULT_ACTION /bin/sh -c id -a

ORA-4031 Troubleshooting – Issue with durations in 11gR2

I recently had to troubleshoot an ORA-4031 issue at a client site. The issue reappeared 3 times within 2 months and only after escalating the SR to SEV1 and being quite persistent for an explanation, the second engineer attempting to solve the issue finally got it right.  Being curious, I digged into the trace files again to confirm and understand the issue here. This blog post describes the method to analyze the issue and how to troubleshoot.

  • First, when the issue first appears, it creates a 4031 trace file, which does not really give enough information to solve the problem. So I asked Oracle Support for specific trace events to properly diagnose it. The correctly mentioned event 10235 level 65536 and heapdump level 536870914  but required an instance restart. After explaining that this is not really possible and researching myself, I found these two events, which do not really need an instance restart. Event 10235 was not really necessary in this case, because the description of the allocations did not matter here.
ALTER system SET events '4031 trace name HEAPDUMP level 536870914, lifetime 1; name errorstack level 3, lifetime 1';
  • Now, after the issue appeared again, we finally had the relevant diagnostic information.

 

Allocation Request Summary Informaton
=====================================
Allocation request for: kglsim object batch
...
Heap: 0x6007b278, size: 3896
******************************************************
HEAP DUMP heap name="sga heap(3,0)"  desc=0x6007b278
extent sz=0xfe0 alt=248 het=32767 rec=9 flg=-126 opc=2
...
durations enabled for this heap
reserved granules for root 0 (granule size 134217728)
Total heap size    =2684352800. (2559 MB)
Total free space   =    38360
Total reserved free space   =120837856 (115MB) 
Permanent space    =2563474760
 
==============================================
TOP 10 MEMORY USES FOR SGA HEAP SUB POOL 1
----------------------------------------------
"free memory               "       6597 MB 71%
"gcs resources             "       1080 MB 12%
"gcs shadows               "        748 MB  8%
"db_block_hash_buckets     "        200 MB  2%
"kglsim object batch       "        189 MB  2%
"kglsim heap               "        112 MB  1%
"gcs res hash bucket       "         64 MB  1%
"ges big msg buffers       "         54 MB  1%
"dbktb: trace buffer       "         51 MB  1%
"KGLH0                     "         48 MB  1%
-----------------------------------------
free memory                        6597 MB
memory alloc.                      2747 MB
Sub total                          9344 MB
==============================================
TOP 10 MEMORY USES FOR SGA HEAP SUB POOL 2
----------------------------------------------
"free memory               "       6641 MB 70%
"gcs resources             "       1080 MB 11%
"gcs shadows               "        747 MB  8%
"db_block_hash_buckets     "        512 MB  5%
"gcs res hash bucket       "         64 MB  1%
"kglsim object batch       "         59 MB  1%
"dbktb: trace buffer       "         51 MB  1%
"KGLH0                     "         44 MB  0%
"kglsim heap               "         35 MB  0%
"ges enqueues              "         22 MB  0%
-----------------------------------------
free memory                        6641 MB
memory alloc.                      2831 MB
Sub total                          9472 MB
==============================================
TOP 10 MEMORY USES FOR SGA HEAP SUB POOL 3
----------------------------------------------
"free memory               "       6488 MB 71%
"gcs resources             "       1080 MB 12%
"gcs shadows               "        747 MB  8%
"db_block_hash_buckets     "        200 MB  2%
"kglsim object batch       "        105 MB  1%
"gc name table             "         72 MB  1%
"gcs res hash bucket       "         64 MB  1%
"kglsim heap               "         63 MB  1%
"dbktb: trace buffer       "         51 MB  1%
"KGLH0                     "         46 MB  1%
-----------------------------------------
free memory                        6488 MB
memory alloc.                      2600 MB
Sub total                          9088 MB
==============================================
TOP 10 MEMORY USES FOR SGA HEAP SUB POOL 4
----------------------------------------------
"free memory               "       6663 MB 70%
"gcs resources             "       1080 MB 11%
"gcs shadows               "        749 MB  8%
"db_block_hash_buckets     "        512 MB  5%
"kglsim object batch       "         59 MB  1%
"dbktb: trace buffer       "         51 MB  1%
"KGLH0                     "         47 MB  0%
"kglsim heap               "         35 MB  0%
"FileOpenBlock             "         35 MB  0%
"ges enqueues              "         22 MB  0%
-----------------------------------------
free memory                        6663 MB
memory alloc.                      2809 MB
Sub total                          9472 MB
TOTALS ---------------------------------------
Total free memory                    26 GB
Total memory alloc.                  11 GB
Grand total                          37 GB
  • This shows that 4031 was triggered for memory allocation request in sga heap (3,0) with size of 3896. This is smaller than 4400 bytes (_shared_pool_reserved_min_alloc), so the request can not be served the the shared pool reserved area.  Next, we can see that we have 4 subpools (kghdsidx_count) and each of them has around 6 GB of free memory. So, in total, we have 37 GB shared pool with 26 GB of it free, but one mini-subpool (duration 0) of subpool 3 still has run out of free contigous memory bigger than 3792 bytes.
  • Running Tanel´s heapdump_analyzer, I could verify that the largest chunk for sga heap(3,0) was indeed 3792 bytes and that all bigger chunks were allocated by “PERM” allocations.

 

 ./heapdump_analyzer  ./PROD1_ora_27396.trc &gt; heap.txt
 
grep -e "---" -e "Total_size" -e "sga heap(3,0)"  heap.txt
 
-- Heapdump Analyzer v1.00 by Tanel Poder ( http://www.tanelpoder.com )
 
Total_size #Chunks  Chunk_size,        From_heap,       Chunk_type,  Alloc_reason
---------- ------- ------------ ----------------- ----------------- -----------------
 
114123720      17    6713160 ,    sga heap(3,0),           R-free,
28442152       1   28442152 ,    sga heap(3,0),             perm,  perm
...
7485000       1    7485000 ,    sga heap(3,0),             perm,  perm
6713080       1    6713080 ,    sga heap(3,0),           R-free,
6712104       1    6712104 ,    sga heap(3,0),           R-perm,  perm
...
3028568       1    3028568 ,    sga heap(3,0),             perm,  perm
...
 
8928       4       2232 ,    sga heap(3,0),             perm,  perm
7864       1       7864 ,    sga heap(3,0),             perm,  perm
5488       2       2744 ,    sga heap(3,0),             perm,  perm
3792       1       3792 ,    sga heap(3,0),             free,  <<<<<<<<<<<<<<
...

This shows that the first chunk of chunk_type “free” is of 3792 bytes, but request was for 3896 bytes. The
subpool (3,0) only has R-free (for shared_pool_reserved area) and PERM allocations of bigger size. The subpool (3,0) consists of 20 Extents á (128M Granule size). Each extent has

Researching the issue with “PERM” allocations brought me to:

ORA-4031: unable to allocate 4160 bytes of shared memory (“shared pool”,”unknown object”,”sga heap(4,0)”,”modification “) (Doc ID 1675470.1)

This note also confirms that the classification in 4 durations leads to a weak spot of duration 0 in any subpool for memory allocations due to high PERM utilization. The note mentions patch 8857940, which was also finally recommended by the support engineer.

HEAP DUMP heap name="sga heap(1,0)"  desc=0x600680e8
Total heap size    =2818570440
Total free space   =    28008 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 
Total reserved free space   =127549960
HEAP DUMP heap name="sga heap(1,1)"  desc=0x60069940
Total heap size    =805305840
Total free space   =632590232
Total reserved free space   = 40243520
HEAP DUMP heap name="sga heap(1,2)"  desc=0x6006b198
Total heap size    =1879046960
Total free space   =1734275840
Total reserved free space   = 93975928
HEAP DUMP heap name="sga heap(1,3)"  desc=0x6006c9f0
Total heap size    =4294964480
Total free space   =4072007248
Total reserved free space   =214821040
HEAP DUMP heap name="sga heap(2,0)"  desc=0x600719b0
Total heap size    =2952788080
Total free space   =    19144 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Total reserved free space   =147689440
HEAP DUMP heap name="sga heap(2,1)"  desc=0x60073208
Total heap size    =671088200
Total free space   =524561448
Total reserved free space   = 33565720
HEAP DUMP heap name="sga heap(2,2)"  desc=0x60074a60
Total heap size    =1879046960
Total free space   =1738701544
Total reserved free space   = 93984160
HEAP DUMP heap name="sga heap(2,3)"  desc=0x600762b8
Total heap size    =4429182120
Total free space   =4202264536
Total reserved free space   =221534200
HEAP DUMP heap name="sga heap(3,0)"  desc=0x6007b278
Total heap size    =2684352800
Total free space   =    38360 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Total reserved free space   =120837856
HEAP DUMP heap name="sga heap(3,1)"  desc=0x6007cad0
Total heap size    =671088200
Total free space   =526361840
Total reserved free space   = 33565720
HEAP DUMP heap name="sga heap(3,2)"  desc=0x6007e328
Total heap size    =1879046960
Total free space   =1736230384
Total reserved free space   = 93977784
HEAP DUMP heap name="sga heap(3,3)"  desc=0x6007fb80
Total heap size    =4294964480
Total free space   =4076323080
Total reserved free space   =214821040
HEAP DUMP heap name="sga heap(4,0)"  desc=0x60084b40
Total heap size    =2952788080
Total free space   =     8536 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Total reserved free space   =140977632
HEAP DUMP heap name="sga heap(4,2)"  desc=0x60087bf0
Total heap size    =1879046960
Total free space   =1735347144
Total reserved free space   = 93984160
HEAP DUMP heap name="sga heap(4,3)"  desc=0x60089448
Total heap size    =4429182120
Total free space   =4198496144
Total reserved free space   =221534200

For reference, these are the relevant parameters. Please note that all of these are set to default, except _kghdsidx_count which was reduced from 4 earlier and _ksmg_granule_size which was reduced to 128M.

NAME                                     VALUE
---------------------------------------- --------------------
__shared_pool_size                       38923141120
_dm_max_shared_pool_pct                  1
_enable_shared_pool_durations            TRUE
_io_shared_pool_size                     4194304
_kghdsidx_count                          4
_ksmg_granule_size                       134217728
_memory_imm_mode_without_autosga         TRUE
_shared_pool_max_size                    0
_shared_pool_minsize_on                  FALSE
_shared_pool_reserved_min_alloc          4400
_shared_pool_reserved_pct                5
shared_pool_reserved_size                1946157056
shared_pool_size                         38654705664

Thanks to Tanel, Riyaj and Hatem Mahmoud

Oracle Database Result Cache Troubleshooting

I was troubleshooting an issue where the Oracle Database Result Cache did not get used when the RESULT_CACHE hint was specified inside a subquery. As the main query has a sysdate inside, the RESULT_CACHE Hint can not be specified for the main query, but only for the subquery.

1
2
SELECT SQ.*, sysdate FROM
(SELECT * FROM tableX, ....) SQ;

When using “/*+ RESULT_CACHE */ inside of Subquery, result cache was not enabled.

1
2
3
 
SELECT SQ.*, sysdate FROM
(SELECT /*+ RESULT_CACHE */ * FROM tableX, ....) SQ;

I found out that there is a trace event which can be set for diagnostics.

1
ALTER SESSION SET EVENTS '43905 trace name context forever, level 1';

This generated a trace file containing a clue to the reason:

1
QKSRC: ViewText[ecode=942] = SELECT  /*+ RESULT_CACHE */ ......

So, it seemed that there was an internal ORA-942 somewhere that prevented result caching from being used. The solution was to qualify the objects inside the subquery. Now the result cache was used correctly:

1
2
SELECT SQ.*, sysdate FROM
(SELECT /*+ RESULT_CACHE */ * FROM ownerY.tableX, ....)

 

 

 

Oracle Linux Update from 7.2 to 7.3 crashes with uncorrectable CPU error

During the Upgrade of Oracle Linux 7.2 to 7.3 with “yum update”, when updating the microcode_ctl package, the system crashed and firmware reported an uncorrectable CPU error after reboot.

It turned out that the update of the package in combination with specific Intel CPUs causes the issue. In the meantime, there is a red hat bug and solution available.

 

References:

  • https://community.oracle.com/thread/3997160
  • https://bugzilla.redhat.com/show_bug.cgi?id=1398698
  • MOS: The system could not start since suddenly rebooted during upgrade to microcode_ctl package (Doc ID 2217876.1)

Problems with big SGAs (>200G) on Linux

I recently had an issue where a database with 240GB SGA (1 huge shared memory segment) configured with hugepages on a system with 512G RAM was suddenly becoming instable and new logons were denied with these error message:


ORA-01034: ORACLE not available
ORA-27123: unable to attach to shared memory segment
Linux-x86_64 Error: 22: Invalid argument
Additional information: 2667
Additional information: 1736718
Additional information: 215016800256

This was strange because ipcs -a showed all shared memory segments and all processes existed. It turned out that there is a known issue with the Linux feature “prelink” and very big SGAs (>200g). This MOS Note gives some details:

Connected to an idle instance, while database instance is running (Process J000 died, see its trace file kkjcre1p: unable to spawn jobq slave process ) (Doc ID 1578491.1)

Obviously, exadata customers also suffered from this issue and therefore prelink is disabled beginning with exadata version 11.2.3.3. Redhat is reporting the same issue in combination with Informix databases in Knowledgebase article: https://access.redhat.com/solutions/1186953

EM 13c: Do not change OPatch

In previous versions, it was “best practice” to always get the most current opatch (patch 6880880) from MOS. Unfortunately, with Enterprise Manager Cloud Control 13c, this is problematic at the moment. The reason is that OMS 13.1 is shipped with OPatch 13.6:

[oracle@em13c ~]$ opatch version
OPatch Version: 13.6.0.0.0
 
OPatch succeeded.

Currently, OPatch 13.6 is not available in MOS. Only OUI Nextgen 13.2 is available, but not opatch 13.6. Do not overwrite OPatch 13.6 with OUI Nextgen 13.2 because it will break it.

EM12c: opatchauto failed with error code 231

When trying to patch OMS 12.1.0.5 in a VirtualBox environment with latest OMS PSU, I came across a strange problem which took quite a while to solve. The opatchauto apply and also -analyze commands failed every time with this error after several minutes of hanging:

opatchauto failed with error code 231

Manual connect to WLS console with relevant protocol/host/port/username/password was working fine. Then I realized that there was an issue with the entropy pool on the VirtualBox VM.

I followed this Note to resolve the isse:

E1: OS: Linux Servers Hang or Have Delays on Any JAVA Process Affecting Performance (Doc ID 1525645.1)

After implementing rngd, the patching worked successfully without any hangs.

AWR Warehouse – security issue

During implementation of AWR Warehouse, I discovered that AWR warehouse is using temporary staging schemas in the AWR warehouse repository database. These schemas life approximately for the duration of a datapump import job and are then dropped again. Due to the fact that the used password is not compliant with customers password verification function, the jobs failed.

v_sql := ‘ CREATE USER ‘ || STAGING_SCHEMA || ‘ IDENTIFIED BY SYS_GUID ‘ ||
‘ DEFAULT TABLESPACE ‘ || tbsname;

The staging schemas are created with the password “SYS_GUID” in capital lettters. This looks to me like the developer tried to generate a random string as password but instead overlooked that the password is set to fixed string “SYS_GUID” instead.

Oracle support has noticed this issue and filed an enhancement request. It is a pity, that this is not filed as a bug, but an enhancement.

Well, I hope this improves in a future version together with dynamic retention and purging options as well as customizable staging directories.

Happy AWR´ing.