Archive for October 2008

AWR Analysis for IO Reporting

Oracle gathers a wealth of information in the Automatic Workload Repository (AWR). We can query this data to get very important IO access time information.

  • Create a working table because depending of the size of the AWR, the queries could take a long time and cause load on the database.

drop table mdecker.filestatistics;
create table mdecker.filestatistics as select t.snap_id, t.begin_interval_time, filename, file#, tsname, phyrds, singleblkrds, readtim, singleblkrdtim, phyblkrd from
dba_hist_filestatxs f ,
dba_hist_snapshot t
where
t.snap_id = f.snap_id and (
begin_interval_time between to_date('05.11.2007 09:00','DD.MM.YYYY HH24:MI') and
to_date('09.11.2007 17:00','DD.MM.YYYY HH24:MI')
OR
begin_interval_time between to_date('12.11.2007 09:00','DD.MM.YYYY HH24:MI') and
to_date('16.11.2007 17:00','DD.MM.YYYY HH24:MI') );

  • Create a summary table

create table mdecker.filestatistics_summary as
select snap_id, begin_interval_time, sum(phyrds) phyrds, sum(singleblkrds) singleblkrds from mdecker.filestatistics
group by snap_id, begin_interval_time
order by snap_id

  • Query the summary table

select snap_id,
y.begin_interval_time,
phyrds as prev_phyrds,
lead(phyrds) over(order by snap_id) current_physrds,
lead(phyrds) over(order by snap_id) - phyrds,
begin_interval_time as prev_interval_time,
lead(begin_interval_time) over(order by snap_id) current_time,
lead(begin_interval_time) over(order by snap_id) - begin_interval_time,
extract(second from lead(begin_interval_time) over(order by snap_id) - begin_interval_time) as sec,
extract(minute from lead(begin_interval_time) over(order by snap_id) - begin_interval_time) as min,
(extract(minute from lead(begin_interval_time) over(order by snap_id) - begin_interval_time)*60)+extract(second from lead(begin_interval_time) over(order by snap_id) - begin_interval_time),
(lead(phyrds) over(order by snap_id) - phyrds)/
((extract(minute from lead(begin_interval_time) over(order by snap_id) - begin_interval_time)*60)+extract(second from lead(begin_interval_time) over(order by snap_id) - begin_interval_time))
from filestatistics_summary y



Is your RAC 10.2.0.4 a ticking bomb?

I have come a across a very nasty bug on Oracle RAC on HP-UX Itanium after upgrading to 10.2.0.4. The problem might also occur on Solaris and Linux x86-64. On one of the RAC instances of a 2 node Cluster, the number of open file descriptors of the oracle racgimon process is increasing by 1 every 60 seconds. This means that if your ulimit of open files for a process is set high and the HP-UX Kernel parameter nfiles is also set high, it might take weeks to months until the racgimon process finally hits the limit. If that happens, it can cause instability of the node because no more filedescriptors can be opened system-wide.

How to check, whether my installation suffers from this bug?

It is very easy: do an lsof -p and look for dozens of open filedescriptors of file hc_SID.dat.

– or –

Check logfile $ORACLE_HOME/log//racg/imon_.log:

2008-10-07 13:05:50.879: [ RACG][82] [29827][82][ora.DBNAME.DBNAME1.inst]: GIMH: GIM-00104: Health
check failed to connect to instance.
GIM-00090: OS-dependent operation:mmap failed with status: 12
GIM-00091: OS failure message: Not enough space
GIM-00092: OS failure occurred at: sskgmsmr_13

Is there a patch?
Good news is, there is. The bug is tracked via BugID 6931689 and there is patch #7298531 available to fix the problem with metalink note 739557.1.