Archive for April 2009

Diagnosing “cursor: pin S wait on X”

I came across a metalink note (Note 786507.1) which describes a very short way to identify the blocker of a session waiting on mutex “cursor: pin S wait on X”. Previously, I thought it could only be analyzed with system state dumps, but the note describes that parameter P2RAW of wait event “cursor: pin S wait on X” contains the blocking session id and the refcount. For 32bit environments the field contains 4 bytes (2 bytes for session id, 2 bytes for refcount). On 64 bit environments, the field contains 8 bytes (4 vs. 4).

select p2raw from v$session where event = ‘cursor: pin S wait on X’

The first 2 or 4 bytes from p2raw have to be converted to dec and point to the blocking Session ID. (SID)



Oracle RAC 10.2.0.4 bug when adding/dropping redo log groups

On the oracle-l list, there was a post about a nasty bug in Oracle RAC 10.2.0.4 which can cause LGWR errors, instance failures and redo log corruptions.

The bug occurs if you add/remove redo log groups on a cluster database, which has some of it´s instances shut down.

More information about this Bug, which is tracked with Bug ID 6786022 can be found on MetaLink. There is a patch for 10.2.0.4 and Linux x86-64 available.



ASSM Problem with too low PCTFREE

During the Hotsos Symposium 2009 Training Day, Jonathan Lewis presented a problem which appears even on current 10g/11g databases. What is especially interesting is how this issue can be diagnosed. I reproduced the problem in 11.1.0.7 and will provide the steps you can use to verify. The problem can be demonstrated best when comparing response time of an update statement for 8k blocksize and 16k blocksize.

  • 8k Blocksize
  • SQL> DROP TABLE t1_8k purge;
     
    SQL> CREATE TABLESPACE DEMO8K datafile SIZE 128M blocksize 8192;
     
    TABLESPACE created.
     
    Elapsed: 00:00:07.35
     
    SQL> CREATE TABLE mdecker.t1_8k
    (n1 NUMBER,
     n2 NUMBER)
     TABLESPACE DEMO8K;
     
    TABLE created.
     
    Elapsed: 00:00:00.02
     
    SQL> INSERT INTO t1_8k
    SELECT TRUNC(dbms_random.VALUE(10000000,100000000)) n1,
               TO_NUMBER(NULL) AS n2
        FROM dual
    CONNECT BY LEVEL <= 500000
    /
     
     
    500000 ROWS created.
     
    Elapsed: 00:00:06.89
     
    SQL> BEGIN dbms_stats.gather_table_stats(
            ownname => 'MDECKER',
            tabname => 'T1_8K',
            estimate_percent => 100);
    END;
    /
     
    PL/SQL PROCEDURE successfully completed.
     
    Elapsed: 00:00:02.13
     
    SQL> SELECT num_rows,blocks FROM dba_tables WHERE table_name = 'T1_8K';
     
      NUM_ROWS     BLOCKS
    ---------- ----------
        500000        874
     
    Elapsed: 00:00:00.15
    SQL> UPDATE t1_8k SET n2 = n1;
     
    500000 ROWS updated.
     
    Elapsed: 00:01:09.04
    SQL> COMMIT;

  • 16k Blocksize
  • SQL> CREATE TABLESPACE DEMO16K datafile SIZE 128M blocksize 16384;
     
    TABLESPACE created.
     
    Elapsed: 00:00:14.75
     
    SQL> CREATE TABLE mdecker.t1_16k
    (n1 NUMBER,
     n2 NUMBER)
     TABLESPACE DEMO16K;
     
    TABLE created.
     
    Elapsed: 00:00:00.03
     
    SQL> INSERT INTO t1_16k
    SELECT TRUNC(dbms_random.VALUE(10000000,100000000)) n1,
               TO_NUMBER(NULL) AS n2
        FROM dual
    CONNECT BY LEVEL <= 500000
    /  
     
     
    500000 ROWS created.
     
    Elapsed: 00:00:05.51
    SQL> BEGIN dbms_stats.gather_table_stats(
            ownname => 'MDECKER',
            tabname => 'T1_16K',
            estimate_percent => 100);
    END;
    /  
     
    PL/SQL PROCEDURE successfully completed.
     
    Elapsed: 00:00:01.78
    SQL> SELECT num_rows,blocks FROM dba_tables WHERE table_name = 'T1_16K';
     
      NUM_ROWS     BLOCKS
    ---------- ----------
        500000        436
     
    Elapsed: 00:00:00.07
     
    SQL>  UPDATE t1_16k SET n2 = n1;
     
    500000 ROWS updated.
     
    Elapsed: 00:20:20.89

    As you can see, the update statement for the 8k blocksize table took around 69 seconds whereas the same update for the table in the 16k tablespace took more than 20 minutes. When executing oradebug short_stack, you can see that for the 16k update, the stacktrace is similar for many executions. So, a lot of time is spent in the kernel functions ktspfsrch() and ktspscan_bmb().


    SQL> oradebug setospid 23668
    Oracle pid: 18, Unix process pid: 23668, image: oracle@ora-vm1.intra (TNS V1-V3)
    SQL> oradebug short_stack
    .....ktspfsrch()+559<-ktspscan_bmb()+315 .....
    SQL> oradebug short_stack
    .....ktspfsrch()+559<-ktspscan_bmb()+315 .....
    SQL> oradebug short_stack
    .....ktspfsrch()+559<-ktspscan_bmb()+315 .....

    It is important to understand that the problem is not necessarily related to the blocksize, but to the PCTFREE value. More information about this topic can be found here:

    http://structureddata.org/files/jl_test_case.html
    http://structureddata.org/2008/09/08/understanding-performance/
    http://www.oraclealchemist.com/oracle/hey-guys-does-size-matter/