EM12c: opatchauto failed with error code 231

When trying to patch OMS 12.1.0.5 in a VirtualBox environment with latest OMS PSU, I came across a strange problem which took quite a while to solve. The opatchauto apply and also -analyze commands failed every time with this error after several minutes of hanging:

opatchauto failed with error code 231

Manual connect to WLS console with relevant protocol/host/port/username/password was working fine. Then I realized that there was an issue with the entropy pool on the VirtualBox VM.

I followed this Note to resolve the isse:

E1: OS: Linux Servers Hang or Have Delays on Any JAVA Process Affecting Performance (Doc ID 1525645.1)

After implementing rngd, the patching worked successfully without any hangs.

AWR Warehouse – security issue

During implementation of AWR Warehouse, I discovered that AWR warehouse is using temporary staging schemas in the AWR warehouse repository database. These schemas life approximately for the duration of a datapump import job and are then dropped again. Due to the fact that the used password is not compliant with customers password verification function, the jobs failed.

v_sql := ‘ CREATE USER ‘ || STAGING_SCHEMA || ‘ IDENTIFIED BY SYS_GUID ‘ ||
‘ DEFAULT TABLESPACE ‘ || tbsname;

The staging schemas are created with the password “SYS_GUID” in capital lettters. This looks to me like the developer tried to generate a random string as password but instead overlooked that the password is set to fixed string “SYS_GUID” instead.

Oracle support has noticed this issue and filed an enhancement request. It is a pity, that this is not filed as a bug, but an enhancement.

Well, I hope this improves in a future version together with dynamic retention and purging options as well as customizable staging directories.

Happy AWR´ing.

OPatch bugs when applying Grid Infrastructure 12.1.0.2.5

For one of my clients, I experienced several issues with applying PSU 12.1.0.2.5 with opatch 12.1.0.1.9.

There were some code changes in opatchauto, which are not yet production-ready. I am looking forward to seeing a new opatch release (maybe 12.1.1.10 or 11) which has these issues fixed.

  • Bug 22091017 : OPATCHAUTO -ANALYZE COMMAND SHUTS DOWN THE RDBMS HOME IN 12C
    “opatchauto apply -analyze” is normally used before patching to verify if there are any conflicts. In our case, the “opatchauto apply -analyze” did a shutdown of the database instances. Not good.
  • BUG 22202019 – OPATCHAUTO FAILS WITH “TOO MANY ARGUMENTS” FROM CHECKSPFFILE SCRIPT
    When applying GI PSU patch to RDBMS Home and database instances are already stopped, patch apply fails. The reason is that opatchauto is trying to query database instances for information before shutting down instances. If instances are already stopped, these queries fail and opatchauto aborts.
  • OPatchauto is shutting down database instance with “srvctl stop home -stopoption TRANSACTIONAL”. I was of the opinion that this is doing a “shutdown transactional”, which in turn will wait until the last session is performing a commit. This can take a long time. Support Engineer explained that the wait will be limited to 10min, after which the system is begin stopped with “shutdown abort”. This is not documented in “http://docs.oracle.com/database/121/RACAD/srvctladmin.htm#RACAD5040” and in my opinion a bug. I expect the shutdown to be “immediate” whithout delaying the patching process.

Oracle releases new Standard Edition 2 for 12.1.0.2

Customers have been waiting for a Standard Edition release for version 12.1.0.2. Then, in July, a support engineer leaked that there will be a whole new type of “Edition”, called “Standard Edition 2” to replace both “Standard Edition” and “Standard Edition One”. The respective MOS Note disappeared shortly after the accidental publication. Now today the release finally became official and can be downloaded via OTN. The “SE2” is priced exactly as the previous “Standard Edition”. SE2 is limited to a single server with 2 CPU sockets, or when used with RAC to two servers with one socket each. In addition, there seems to be a limit of 16 CPU threads (e.g. 8 cores with hyperthreading) when using one host and 8 CPU threads each for both nodes when using RAC.

So beginning with 12.1.0.2, there will be no more “Standard Edition (4 CPU socket limit)” or “Stanard Edition One (2 socket limit)” releases anymore. There is no way to avoid going to “Standard Edition 2 (2 socket limit)” in the future.

This might mean for SE customers, which were using 4 CPU sockets (occupied) on a single host or a 2-node RAC Cluster with 2 CPU sockets (occupied) per host, that they have to remove half the (occupied) sockets from their installation. I am not sure what happens with already bought SE licences and how they can be “migrated” to SE2 licenses. I hope that this will not start a new wave of license-driven Oracle->PostgreSQL/MySQL/MS SQL Migrations.

Now would probably be the right time for 11gR2 Standard Edition users to start upgrade tests for 12c to gather experience. On the other hand, waiting until end of year and bugs found by “early adaptors” might safe you some troubles.

Oracle Disaster Recovery with Microsoft Hyper-V Replica

There is one very interesting feature of Microsoft Hyper-V called Replica, which allows replication of Virtual Machines without shared storage. With Hyper-V Replica, one can set up Oracle Database VMs to replicate between independent nodes. This document (in german) describes the setup in more detail:

Hyper-V Replica für Oracle Database

Enterprise User Security – Presentation Material available

On the presentations page you can find the my presentation material (german) from DOAG regional meeting September 2014 in Munich about “Enterprise User Security”.

11.2.0.4 DBUA silently changing NLS_TERRITORY init.ora Parameter during upgrade

In case you are planning to perform any upgrades with DBUA, double-check that the init.ora parameters after the upgrade. In a production upgrade at a client site, dbua silently changed init.ora parameter NLS_TERRITORY from GERMANY to AMERICA. It was only noticed when after the upgrade, decimal and grouping number characters were swapped, leading to application problem.

Support had to admit that it is unpublished bug 16538186 and that there is NO documentation of it anywhere at the moment.

We learned from this to double-check init.ora parameters after dbua with before upgrade parameters.

Index Clustering Factor finally more realistic

I just stumbled upon this bug reference on My Oracle Support:

Bug 13262857  Enh: provide some control over DBMS_STATS index clustering factor computation

This enhancement was long due. Previously, when computing the clustering factor during gathering statistics, the value was incremented, whenever the row was not found in the same block as the previous row. Now, it is finally possible to determine how many blocks should be considered when computing clustering factor. The patch delivers an improved DBMS_STATS package body that can be used to set preferences with value TABLE_CACHED_BLOCKS.

The flaw in the over-simplistic and pessimistic original computation was something that Jonathan Lewis documented in his Book “Cost Based Fundamentals” in year 2006 along with an workaround to manually gather improved clustering factor values!

The relevant chapter of this book is available as PDF at this URL.

 

 

Datapump Export suffering from Oracle Row Migration

Recently, i was troubleshooting a datapump export duration problem. Over the period of 18 months, the duration of a multi-table export increased dramatically. A quick analysis showed that the export duration was mainly dependent on one big table. This 50 GB table with no BLOB/CLOB/LONG datatypes took more than 4 hours on a modern system wheras it should not take more than 10 – 15 minutes. The system was performing ONLY single-block I/O requests (db file sequential read). I found this strange and started investigating.

Beginning with 11g, you can enable sql_trace (10046) for datapump with the new syntax:

ALTER system SET events 'sql_trace {process : pname = dw | pname = dm} level=12';

The trace showed that there were dozens of consecutive single-block I/O requests against the SAME data block:

WAIT #47053877709656: nam='db file sequential read' ela= 344 file#=41 block#=2556595 blocks=1 obj#=120874 tim=1347023848770469
WAIT #47053877709656: nam='db file sequential read' ela= 6 file#=41 block#=2556595 blocks=1 obj#=120874 tim=1347023848770504
WAIT #47053877709656: nam='db file sequential read' ela= 5 file#=41 block#=2556595 blocks=1 obj#=120874 tim=1347023848770526
WAIT #47053877709656: nam='db file sequential read' ela= 5 file#=41 block#=2556595 blocks=1 obj#=120874 tim=1347023848770548
WAIT #47053877709656: nam='db file sequential read' ela= 4 file#=41 block#=2556595 blocks=1 obj#=120874 tim=1347023848770570
WAIT #47053877709656: nam='db file sequential read' ela= 5 file#=41 block#=2556595 blocks=1 obj#=120874 tim=1347023848770593
WAIT #47053877709656: nam='db file sequential read' ela= 5 file#=41 block#=2556595 blocks=1 obj#=120874 tim=1347023848770617
WAIT #47053877709656: nam='db file sequential read' ela= 4 file#=41 block#=2556595 blocks=1 obj#=120874 tim=1347023848770639

No wonder that this does not perform well, right? Next was a block dump of this interesting block from file 52 / block 2056439.

SELECT * FROM dba_extents WHERE file_id = 41 AND 2556595 BETWEEN block_id AND block_id+blocks-1;
ALTER system dump datafile 41 block 2556595;

In the resulting trace file, I verified that i dumped the block from the correct table segment. (Hex 0x1d82a is Dec 120874)

buffer tsn: 15 rdba: 0x0a6702b3 (41/2556595)
frmt: 0x02 chkval: 0x57f3 TYPE: 0x06=trans DATA
seg/obj: 0x1d82a csc: 0x0c.1c3abc94 itc: 20 flg: E typ: 1 - DATA

Then I had a closer look at the flag bytes.

The flags were:

  • H: Head Piece
  • F: First Piece
  • L: Last Piece

Normally, “H-FL” is shown to indicate that the whole row is located in one rowpiece in the same block. In this block dump, you can see dozens of rows with “—-FL–” which means that the head row piece is not here and you see no Head-Piece-only Row Pieces  “–H—–” . This shows that this segment has suffered from heavy row migration. Probably, this was caused by adding columns after the table creation.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
SID_ora_999.trc | grep "^tl: "
tl: 97 fb: ----FL-- lb: 0x0 cc: 20
tl: 88 fb: --H-FL-- lb: 0x0 cc: 20
tl: 97 fb: ----FL-- lb: 0x0 cc: 20
tl: 88 fb: --H-FL-- lb: 0x0 cc: 20
tl: 98 fb: ----FL-- lb: 0x2 cc: 20
tl: 89 fb: --H-FL-- lb: 0x2 cc: 20
tl: 99 fb: ----FL-- lb: 0x2 cc: 20
tl: 90 fb: --H-FL-- lb: 0x2 cc: 20
tl: 99 fb: ----FL-- lb: 0x2 cc: 20
tl: 90 fb: --H-FL-- lb: 0x2 cc: 20
tl: 99 fb: ----FL-- lb: 0x2 cc: 20
tl: 90 fb: --H-FL-- lb: 0x2 cc: 20
tl: 99 fb: ----FL-- lb: 0x2 cc: 20
tl: 90 fb: --H-FL-- lb: 0x2 cc: 20
tl: 99 fb: ----FL-- lb: 0x2 cc: 20
tl: 90 fb: --H-FL-- lb: 0x2 cc: 20
tl: 90 fb: --H-FL-- lb: 0x2 cc: 20
tl: 99 fb: ----FL-- lb: 0x2 cc: 20
tl: 90 fb: --H-FL-- lb: 0x2 cc: 20
tl: 99 fb: ----FL-- lb: 0x2 cc: 20
tl: 90 fb: --H-FL-- lb: 0x2 cc: 20
tl: 99 fb: ----FL-- lb: 0x2 cc: 20
tl: 90 fb: --H-FL-- lb: 0x2 cc: 20
tl: 99 fb: ----FL-- lb: 0x2 cc: 20
tl: 90 fb: --H-FL-- lb: 0x2 cc: 20
tl: 99 fb: ----FL-- lb: 0x2 cc: 20
tl: 90 fb: --H-FL-- lb: 0x2 cc: 20
tl: 99 fb: ----FL-- lb: 0x2 cc: 20
tl: 90 fb: --H-FL-- lb: 0x2 cc: 20
tl: 99 fb: ----FL-- lb: 0x2 cc: 20
tl: 90 fb: --H-FL-- lb: 0x2 cc: 20
tl: 99 fb: ----FL-- lb: 0x2 cc: 20
tl: 90 fb: --H-FL-- lb: 0x2 cc: 20
tl: 90 fb: --H-FL-- lb: 0x2 cc: 20
tl: 99 fb: ----FL-- lb: 0x2 cc: 20
tl: 90 fb: --H-FL-- lb: 0x2 cc: 20
tl: 99 fb: ----FL-- lb: 0x2 cc: 20
tl: 90 fb: --H-FL-- lb: 0x2 cc: 20
tl: 99 fb: ----FL-- lb: 0x2 cc: 20
tl: 89 fb: --H-FL-- lb: 0x2 cc: 20
tl: 99 fb: ----FL-- lb: 0x2 cc: 20
tl: 89 fb: --H-FL-- lb: 0x2 cc: 20
tl: 99 fb: ----FL-- lb: 0x2 cc: 20
tl: 89 fb: --H-FL-- lb: 0x2 cc: 20
tl: 99 fb: ----FL-- lb: 0x2 cc: 20
tl: 89 fb: --H-FL-- lb: 0x2 cc: 20
tl: 99 fb: ----FL-- lb: 0x2 cc: 20
tl: 89 fb: --H-FL-- lb: 0x2 cc: 20
tl: 99 fb: ----FL-- lb: 0x2 cc: 20
tl: 89 fb: --H-FL-- lb: 0x2 cc: 20
tl: 99 fb: ----FL-- lb: 0x2 cc: 20
tl: 90 fb: --H-FL-- lb: 0x2 cc: 20
tl: 90 fb: --H-FL-- lb: 0x2 cc: 20
tl: 99 fb: ----FL-- lb: 0x2 cc: 20
tl: 90 fb: --H-FL-- lb: 0x2 cc: 20
tl: 99 fb: ----FL-- lb: 0x2 cc: 20
tl: 90 fb: --H-FL-- lb: 0x2 cc: 20
tl: 99 fb: ----FL-- lb: 0x2 cc: 20
tl: 90 fb: --H-FL-- lb: 0x2 cc: 20
tl: 99 fb: ----FL-- lb: 0x2 cc: 20
tl: 90 fb: --H-FL-- lb: 0x2 cc: 20
tl: 99 fb: ----FL-- lb: 0x2 cc: 20
tl: 90 fb: --H-FL-- lb: 0x2 cc: 20
tl: 99 fb: ----FL-- lb: 0x2 cc: 20
tl: 90 fb: --H-FL-- lb: 0x2 cc: 20
tl: 99 fb: ----FL-- lb: 0x2 cc: 20
tl: 90 fb: --H-FL-- lb: 0x2 cc: 20
tl: 99 fb: ----FL-- lb: 0x2 cc: 20
tl: 90 fb: --H-FL-- lb: 0x2 cc: 20
tl: 90 fb: --H-FL-- lb: 0x2 cc: 20
tl: 99 fb: ----FL-- lb: 0x2 cc: 20

With datapump there are 2 different access methods: EXTERNAL_TABLE and DIRECT_PATH. Usually, the datapump utility decides on it´s own which method to use. It turned out that with EXTERNAL_TABLE, the table export takes only 10 minutes and does not perform these single-block I/O. It only appears with DIRECT_PATH.

So, the next step I recommended was to reorganize the table to get rid of row migration and evaluate whether PCTFREE should be increased. Afterwards export durations are back to normal.

Bug 13397104: INSTANCE CRASH WITH ORA-600 [KJBLPKEYDRMQSCCHK:PKEY]

I would like to point your attention to a bug in RAC 11.2.0.3. The bug leads to instance crashes during Dynamic Remastering. As of June 1st, there is no patch publicly available. The workaround seems to be to set:

_gc_read_mostly_locking=false

So, if you are thinking about upgrading your RAC to 11.2.0.3, maybe you should wait just a little bit longer… 😉

More details are in MOS: Bug 13397104: INSTANCE CRASH WITH ORA-600 [KJBLPKEYDRMQSCCHK:PKEY]

Update 12.06.2012: Patch 13397104 is now available in MOS for 11.2.0.3.