Feb 27, 2014

Troubleshoot ORA-07445: exception encountered: core dump

Troubleshoot ORA-07445: exception encountered: core dump

Scope : Oracle Database - Enterprise Edition - Version 11.2.0.3.0

Symptoms : Occasional errors found like:
ORA-07445: exception encountered: core dump
 [kglic0()+756] [SIGSEGV] [ADDR:0x430000010B8AE910] [PC:0x107AB7474] [Address not mapped to object] []

Cause : 
During SQL statistics analysis for query execution, we load dependency information into the Library Cache.  These dependency memory pieces can be overwritten by other sessions without causing an error.  When the original session runs into the wrong dependency information it can lead to the internal error and performance issues during subsequent queries of these Library Cache objects.

Solution:

There are so-many bugs related to 11.2.0.3.0 If you are getting more, raise SR with Oracle Support for recommended patch-set or solution.
I have some workaround on this.

My database version is 11.2.0.3.0, But have the same error in my prod database.

I set the following parameter and bounced the database and current issue resolved.

SQL> ALTER SYSTEM SET "_cursor_stats_enabled"=false SCOPE=SPFILE;

Explanation: 

Setting "_cursor_stats_enabled"=false will disable information being maintained under the fixed view V$SQLSTATS. Therefore, OEM, AWR, and ADDM reports will not include information on SQL statistics.

To reset "_cursor_stats_enabled" to its default value use:

SQL> ALTER SYSTEM RESET "_cursor_stats_enabled";

Do only after Oracle recommendation in production env.

Caution : This recommendation can be applied in prod database with Oracle support only. If you are applying it is your own risk.

Note: Any hidden parameter value should not be touched without Oracle Support recommendation.

To See Hidden parameters :

SQL> SELECT a.ksppinm  Param,  b.ksppstvl SessionVal, c.ksppstvl InstanceVal,  a.ksppdesc Descr
  FROM x$ksppi a, x$ksppcv b, x$ksppsv c
 WHERE a.indx = b.indx AND a.indx = c.indx
   AND a.ksppinm LIKE '/_%' escape '/'
 ORDER BY 1;

Thanks.

Please feel free to post comments...You can share it to your friends through gmail / facebook .....

Feb 22, 2014

Oracle 11gR2 RMAN Backup Compression : with Case study

RMAN Backup Compression : Oracle 11g Release 2 

Oracle 11g Release 2 introduced compression algorithm levels which can be used for compressing table data, Data Pump exports and RMAN  backups as well.

The compression levels are BASIC, LOW, MEDIUM and HIGH and each affords a trade off related to backup througput and the degree of compression afforded.

To use this option, we can run the following RMAN commands:

Binary compression creates some performance overhead during backup and restore operations. Binary compression consumes CPU resources, so compressed backups should not be scheduled when CPU usage is already high. However, the following circumstances may warrant paying the performance penalty:

  • You are using disk-based backups when disk space in your flash recovery area or other disk-based backup destination is limited.
  • You are performing your backups to some device over a network when reduced network bandwidth is more important than CPU usage.
  • You are using some archival backup media such as CD or DVD, where reducing backup sizes saves on media costs and archival storage.


Commands to be used:

Step-1: Set RMAN configuraton for backup type:

RMAN> CONFIGURE DEVICE TYPE DISK BACKUP TYPE TO COMPRESSED BACKUPSET;

Step-2: Set RMAN configuraton for compresion algorithim type:

RMAN> CONFIGURE COMPRESSION ALGORITHM ‘HIGH’;
or
RMAN> CONFIGURE COMPRESSION ALGORITHM 'MEDIUM';
or
RMAN> CONFIGURE COMPRESSION ALGORITHM ‘LOW’;
or
RMAN> CONFIGURE COMPRESSION ALGORITHM ‘BASIC’;

Step-3: Chnage your bacup script :
rman > backup as compressed backupset incremental level 0 database;
OR
connect target `/'
run
{
allocate channel ch1 device type disk;
backup as compressed backupset incremental level 0 database;
release channel ch1;
}

Example:

RMAN> configure device type disk backup type to compressed backupset;
using target database control file instead of recovery catalog
new RMAN configuration parameters:
configure device type disk backup type to compressed backupset parallelism 1;
new RMAN configuration parameters are successfully stored

RMAN> show all;
configure device type disk backup type to compressed backupset parallelism 1;

RMAN> CONFIGURE COMPRESSION ALGORITHM 'MEDIUM';

using target database control file instead of recovery catalog
new RMAN configuration parameters:
CONFIGURE COMPRESSION ALGORITHM 'MEDIUM' AS OF RELEASE 'DEFAULT' OPTIMIZE FOR LOAD TRUE;
new RMAN configuration parameters are successfully stored

It is possible to compress archived redo log and incremental backupsas follows:

RMAN> backup as compressed backupset archivelog all;
RMAN> backup as compressed backupset incremental level 0 database;
RMAN> backup as compressed backupset incremental level 1 database;
RMAN> backup as compressed backupset incremental level 1 cumulative database;

To de-configure:

By changing this parameter, all backups will be compressed.  To return to the default non-compressed configuration, clear the configuration to make future backups be non-compressed:

RMAN> configure device type disk clear;
old RMAN configuration parameters:
configure device type disk backup type to compressed backupset parallelism 1;
RMAN configuration parameters are successfully reset to default value
RMAN> show all;
configure device type disk parallelism 1 backup type to backupset; # default

Tests carried out on a small 1.5 GB database returned compression ratios of about 4.7 at best case. However, it should be noted that use of

LOW, MEDIUM and HIGH requires the Advanced Compression license. The backupset size and backup durations are shown below.

Compression Level ‘HIGH’

backupset size: 226.18M
time: 00:02:21

Compression Level ‘Medium’

backupset size: 293.80M
time: 00:00:30

Compression Level ‘Low’

backupset size: 352.59M
time: 00:00:20

Compression Level ‘Basic’

backupset size: 276.55M
time: 00:00:50

To summaries we can conclude:

LOW – corresponds to LZO (11gR2) – smallest compression ratio, fastest
MEDIUM – corresponds to ZLIB (11gR1) – good compression ratio, slower than LOW
HIGH – corresponds to unmodified BZIP2 (11gR2) – highest compression ratio, slowest
BASIC (which is free) – corresponds to BZIP2 (10g style compression) – compression ratio in the range of MEDIUM, but slower

Note : Binary compression can be CPU-intensive. You can choose which compression algorithm that RMAN uses for backups. By default, RMAN uses BZIP2, which has a very good compression ratio. ZLIB compression, which requires a COMPATIBLE setting of 11.0.0 or higher, is very fast but has a lower compression ratio than other algorithms.

As shown, the compressed backup is smaller in size than the uncompressed backup. To identify which files were compressed, query v$backup_files as follows:

sql> column fname format a90;
sql> set lines 180;
sql> set pages 500;
SQL> select fname, compressed from v$backup_files; 

FNAME                     COMPRESSED
-----------               -----------
C:\uncompressed_sysaux.bkp       NO
C:\copmressed_sysaux.bkp           YES

I have tested in one of my production database environment with following configurations:
-----------------------------------------------------------------------------------------------
-- Before compression method ( When CPU% was <10%)
a) Backup size :  20.68G
b) Elapsed Time : 6 min ( max)
-- Applied following commands and see the statistics
1) RMAN> CONFIGURE DEVICE TYPE DISK BACKUP TYPE TO COMPRESSED BACKUPSET;
2) CONFIGURE COMPRESSION ALGORITHM 'MEDIUM';
3) backup as compressed backupset incremental level 0 database;

-- After applied compression method ( When CPU% was <10%)
a) Backup size :  4.03G ( around 80% reduced)
b) Elapsed Time : 8 min ( around 30% time consumption increased)

So based on calculation and test scenarios, you decide what type of compression method you will follow. But It is best practice when CPU utilization is very very less for longer duration in off peak hours in OLTP database environments, you can go for any compression method.

In case of doubts / clarification, please write to me. Please feel free to write comments.

Feb 18, 2014

Troubleshoot Oracle Event : 'switch logfile command'

Troubleshoot Oracle Event : 'switch logfile command'

When log switch will take more time in oracle database, you may observed oracle event "switch logfile command" which block the session for some time.


-- Basics Checks:
If native disk has I/O issue, you may found the above event also. So, change the 'log_archive_dest' location to high throughput performing I/O disk area and observe the issue again. If issue persists then do the following checks and fix the issue:

-- Check the following parameters :

SQL> show parameter disk_asynch_io;

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
disk_asynch_io                       boolean     TRUE

SQL> show parameter filesystemio_options;

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
filesystemio_options                 string      none


Solution :

'filesystemio_options' parameter value is showing as 'none'. Set this value as 'asynch' in RAC environments and 'SETALL' in non-ASM instances.

This parameter "FILESYSTEMIO_OPTIONS" controls which IO options are used.

"setall" Enables both ASYNC and DIRECT IO, hence can lead to faster writes and therefore better performance.

Sure you don't have said oracle event.
If any doubt, please feel free to post queries/comment/suggestions.

Feb 7, 2014

Troubleshoot Oracle Event : Library Cache: mutex X & High SQL version counts

-- Resolving Oracle Event : Library cache: mutex X
-- Resolving high version count ( bug in Oracle 11.2.0.1 / 11.2.0.2.0 - in RAC)

-- To find if high version count observed or not
Steps:
1) Take an AWR report
2) Go to : Main report --> SQL Statistics --> SQL ordered by Version Count
3) If counts are withing duble digit, then ok. If it is observed any kind of triple digit or more than 100, i.e., you are facing high version count for the queries.

-- What is SQL version count in Oracle?
when you issue a SQL statement, the database searches the library cache to find a cursor with matching SQL text. Then it can happen that even though the text matches, there are some other differences that prevent you from using existing cursor (e.g. different optimizer settings, different NLS settings, different permissions etc.). In such cases, a new child cursor is created. So basically child cursors are different versions of the same SQL statement.

If you have SQL statements with thousands of versions, this could mean a problem for your shared pool (child cursors taking up lots of space and causing fragmentation), as well as a potential for performance problems due to plan instability (if the same SQL text is parsed to a new plan every time, sooner or later it will be a bad plan). That's why AWR report has this list.

According to Oracle support, up to a couple of hundreds versions doesn't indicate a problem (cursor sharing mechanism isn't perfect), but when you have thousands or tens of thousands of versions, you should check your cursor sharing settings (first of all, CURSOR_SHARING parameter).

Click here to read More from Oracle Blog

Sample AWR report snap-shot when high version count observed:


-- Techical desrciption

According to notes 9282521.8 and 9239863.8 describing the patches, the enhancements should be used:
When there is true contention on a specific library cache object….

For example:- A package that is so hot (heavily accessed ) in library cache will be contended and the sessions appear to be waited on Library Cache: mutex X.

There are many bugs and cases appeared in metalink with mutexes where in the below case is just a one of them.

Disclaimer:- Do not test in production

The below script is just calling dbms_application_info package and when executed concurrently in many sessions it may cause the contention on library cache.

declare
i number;
begin
for i in 1..1000000
loop
   execute immediate ‘begin dbms_application_info.set_client_info(”mutex”);end;’;
end loop;
end;
/

As the sessions running, generate a awr report and you can see the Wait event library cache: mutex X in concurrency class.

So this is evident that you are having latch(mutex) issue.

How to overcome this.?

Oracle gives the ability to create a multiple clones of the hot objects in library cache and the sessions will access/use them individually rather contending for one.

Please note, its not pin (pin in the library cache), its marking the library cache object as hot to allow oracle to create multiple copies of the same.

Solution 1: Prior to 11gR2

a) Parameter "_kgl_hot_object_copies" controls the maximum number of copies.

b) Complementary parameter _kgl_debug marks hot library cache objects as a candidate for cloning.

Syntax of this parameter can be found in MOS descriptions of bugs 9684368, 11775293 and others. One form of such marking is

"_kgl_debug"="name=’schema=’ namespace= debug=33554432?

With our example the syntax would be,

SQL> alter system set "_kgl_debug"="name='DBMS_APPLICATION_INFO' schema='SYS' namespace=1 debug=33554432?, "name='DBMS_APPLICATION_INFO' schema='SYS' namespace=2 debug=33554432' scope=spfile;

SQL> alter system set "_kgl_hot_object_copies"= 255 scope=spfile;

Solution 2: 11gr2 onwards

dbms_shared_pool.markhot(
schema IN VARCHAR2,
objname IN VARCHAR2,
namespace IN NUMBER DEFAULT 1, — library cache namespace to search
global IN BOOLEAN DEFAULT TRUE); — If TRUE mark hot on all RAC instances

or

dbms_shared_pool.markhot(
hash IN VARCHAR2, — 16-byte hash value for the object
namespace IN NUMBER DEFAULT 1,
global IN BOOLEAN DEFAULT TRUE);

exec dbms_shared_pool.markhot(‘SYS’,’DBMS_APPLICATION_INFO’,1);
exec dbms_shared_pool.markhot(‘SYS’,’DBMS_APPLICATION_INFO’,2);
exec dbms_shared_pool.markhot(hash=>3222383532,NAMESPACE=>0);

The namespace can be found with the following query (Andrey.Nikolaev blog)

col name format a20
col cursor format a12 noprint
col type format a7
col LOCKED_TOTAL heading Locked format 99999
col PINNED_TOTAL heading Pinned format 99999999
col EXECUTIONS heading Executed format 99999999
col NAMESPACE heading Nsp format 999
set wrap on
set linesize 80
select *
  from (select case
                 when (kglhdadr = kglhdpar) then
                  'Parent'
                 else
                  'Child ' || kglobt09
               end cursor,
               kglhdadr ADDRESS,
               substr(kglnaobj, 1, 20) name,
               kglnahsh hash_value,
               kglobtyd type,
               kglobt23 LOCKED_TOTAL,
               kglobt24 PINNED_TOTAL,
               kglhdexc EXECUTIONS,
               kglhdnsp NAMESPACE
          from x$kglob
         order by kglobt24 desc)
 where rownum <= 10;

--- Found version count history

--set pages 2000 lines 100
SELECT b.*
FROM v$sqlarea a ,
TABLE(version_rpt(a.sql_id)) b
WHERE loaded_versions >=100;

-- Most effective and easy solution
Note : Bug fixed in Oracle 11.2.0.3

-- Check the parameter

SQL> show parameter optimizer_secure_view_merging;

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
optimizer_secure_view_merging boolean     TRUE


-- If TRUE, then chnage the below paramter:

sql> alter system set optimizer_secure_view_merging=FALSE;

-- Verified changed or not
SQL> show parameter optimizer_secure_view_merging;

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
optimizer_secure_view_merging boolean     FALSE









Sample AWR report snap-shot when high version minimized after changing the above parameter:
When I applied one of production database server where we had high version count issue, then amazingly it reduced. You can see the difference from both snap-shots.

Sure your version count will be minimized.

Feb 6, 2014

Oracle Wait Events and Solution

Different Wait Events and Solution in Oracle
-        List of popular events
-        Eradication
List of some of wait events:
-          Buffer busy wait
-          DB file sequential read
-          Enq: TX – row lock contention
-          Enq : TM – index contention
-          Row cache lock wait
-          Read by other session

1) Buffer Busy Wait:
            This wait event happens when a session tries to access a block in the buffer cache but it can't because the buffer is busy, that is another session is modifying the block and the contents of the block are in flux.

Getting more information about buffer busy waits
To get more information about the SQL statement being executed and the block being waited for, trace the session or gather data from V$SESSION V$SESSION_WAIT (or just V$SESSION in 10g and higher):

sql>
SELECT s.sql_hash_value, sw.p1 file#, sw.p2 block#, sw.p3 reason
FROM v$session_wait sw, v$session s
WHERE sw.event = 'buffer busy waits'
AND sw.sid = s.sid;


>> P1 - file number of the data file containing the block being waited for
>> P2 - block number of the block being waited for
>> P3 - this wait is called from many different sections of Oracle code and each uses their own reason code which differ among versions.

To determine the object being waited for, use the P1 (file_number) and P2 (block_number) information from the above query:

SELECT  owner , segment_name , segment_type
FROM  dba_extents
WHERE  file_id = &FileNumber
AND  &BlockNumber BETWEEN block_id AND block_id + blocks -1;


Another query that can be very useful is finding the objects in the entire Oracle database that are suffering from "buffer busy waits". The following query gives the top 10 segments:

SELECT * FROM (
   SELECT owner, object_name, subobject_name, object_type,
          tablespace_name, value
   FROM v$segment_statistics
   WHERE statistic_name='buffer busy waits' and owner not like '%SYS%'
   ORDER BY value DESC)
WHERE ROWNUM <=10;

Fixing buffer busy waits :

Once the database object is known, consider the following causes of contention and their solutions.

a) Undo Header - If using Automatic Undo Management (AUM), increase the size of the undo tablespace. If not using AUM, add more rollback segments.

b) Undo Block - If using AUM, increase size of the undo tablespace. If not using AUM, increase rollback segment sizes.

c) Data Block- Data blocks are the blocks that hold the row data in a table or index. The typical problem is that multiple sessions are requesting a block that is either not in cache or in an incompatible mode (read by other session in 10g and higher).

d) Tune inefficient queries that read too many blocks into the buffer cache. These queries could flush out blocks that may be useful for other sessions in the buffer cache. By tuning queries, the number of blocks that need to be read into the cache is minimized, reducing aging out of the existing "good" blocks in the cache.

e) Resolve Hot Blocks - If the queries above consistently return the same block or set of blocks, this is considered a hot block scenario. Delete some of the hot rows and insert them back into the table. Most of the time, the rows will be placed in a different block. The DBA may need to adjust PCTFREE and/or PCTUSED to ensure the rows are placed into a different block. Also talk with the developers to understand why a set of blocks are hot.

f) Place Table in Memory - Cache the table or keep the table in the KEEP POOL. When multiple sessions are requesting the blocks that reside in the disk, it takes too much time for a session to read it into the buffer cache. Other session(s) that need the same block will register 'buffer busy wait'. If the block is already in buffer cache, however, this possibility is eliminated. Another alternative is to increase the buffer cache size. A larger buffer cache means less I/O from disk. This reduces situations where one session is reading a block from the disk subsystem and other sessions are waiting for the block.

g) Fix Low Cardinality Indexes - Look for ways to reduce the number of low cardinality indexes, i.e. an index with a low number of unique values that could result in excessive block reads. This can especially be problematic when concurrent DML operates on table with low cardinality indexes and cause contention on a few index blocks.

h) Adjust PCTFREE/PCTUSED or use ASSM - When sessions insert/delete rows into/from a block, the block must be taken out of the freelist if the PCTFREE threshold reached. When sessions delete rows from a block, the block will be put back in the freelist if PCTUSED threshold is reached. If there are a lot of blocks coming out of the freelist or going into it, all those sessions have to make that update in the freelist map in the segment header. A solution to this problem is to create multiple freelists. This will allow different insert streams to use different freelists and thus update different freelist maps. This reduces contention on the segment header block. You should also look into optimizing the PCTUSED/PCTFREE parameters so that the blocks don't go in and out of the freelists frequently. Another solution is to use ASSM which avoids the use of freelists all together.

i) Increase Extent Size - If extents are too small, Oracle must constantly allocate new extents causing contention in the extent map

2) Db File Sequential Read :

There are two things you can do to these waits: optimize the SQL statement or reduce the average wait time.

The db file sequential read wait event has three parameters: file#, first block#, and block count. In Oracle Database 10g, this wait event falls under the User I/O wait class. Keep the following key thoughts in mind when dealing with the db file sequential read wait event.

• The Oracle process wants a block that is currently not in the SGA, and it is waiting for the database block to be read into the SGA from disk.
• The two important numbers to look for are the TIME_WAITED and AVERAGE_WAIT by individual sessions.
• Significant db file sequential read wait time is most likely an application issue.

Common Causes, Diagnosis, and Actions :

The db file sequential read wait event is initiated by SQL statements (both user and recursive) that perform single-block read operations against indexes, rollback (or undo) segments, and tables (when accessed via rowid), control files and data file headers. This wait event normally appears as one of the top five wait events, according to systemwide waits.

Physical I/O requests for these objects are perfectly normal, so the presence of the db file sequential read waits in the database does not necessarily mean that there is something wrong with the database or the application. It may not even be a bad thing if a session spends a lot of time on this event. In contrast, it is definitely bad if a session spends a lot of time on events like enqueue or latch free. This is where this single-block read subject becomes complicated. At what point does the db file sequential read event become an issue? How do you define excessive? Where do you draw the line? These are tough questions, and there is no industry standard guideline. You should establish a guideline for your environment. For example, you may consider it excessive when the db file sequential read wait represents a large portion of a process response time. Another way is to simply adopt the nonscientific hillbilly approach—that is, wait till the users start screaming.

You can easily discover which session has high TIME_WAITED on the db file sequential read wait event from the V$SESSION_EVENT view. The TIME_WAITED must be evaluated with the LOGON_TIME and compared with other nonidle events that belong to the session for a more accurate analysis. Sessions that have logged on for some time (days or weeks) may accumulate a good amount of time on the db file sequential read event. In this case, a high TIME_WAITED may not be an issue. Also, when the TIME_WAITED is put in perspective with other nonidle events, it prevents you from being blindsided. You may find another wait event which is of a greater significance. Based on the following example, SID# 192 deserves your attention and should be investigated:

sql >
select b.sid,
       nvl(substr(a.object_name, 1, 30),
           'P1=' || b.p1 || 'P2=' || b.p2 || 'P3=' || b.p3) "Object_Name",
       a.subobject_name,
       a.object_type
  from dba_objects a, v$session_wait b, x$bh c
 where c.obj = a.object_id(+)
   and b.p1 = c.file#(+)
   and b.p2 = c.dbablk(+)
   and b.event = 'db file sequential read'
UNION
select b.sid,
       nvl(substr(a.object_name, 1, 30),
           'P1=' || b.p1 || 'P2=' || b.p2 || 'P3=' || b.p3) "Object_Name",
       a.subobject_name,
       a.object_type
  from dba_objects a, v$session_wait b, x$bh c
 where c.obj = a.object_id(+)
   and b.p1 = c.file#(+)
   and b.p2 = c.dbablk(+)
   and b.event = 'db file sequential read';

3) Enq: TX - row lock contention :

A wait for the Oracle TX enqueue in mode 6 (row lock contention) is a common enqueue wait, and occurs when a transaction tries to update or delete rows that are currently locked by another transaction.

A wait for the TX enqueue in mode 6 (P1 = 1415053318, P1RAW = 54580006) is the most common enqueue wait. (In Oracle Database 10g/ 11g, the wait event name is enq: TX—row lock contention.) This indicates contention for row-level lock. This wait occurs when a transaction tries to update or delete rows that are currently locked by another transaction. This usually is an application issue. The waiting session will wait until the blocking session commits or rolls back its transaction. There is no other way to release the lock. (Killing the blocking session will cause its transaction to be rolled back.)

The following listing shows an example of TX enqueue wait in mode 6 as seen in the V$LOCK view:

Whenever you see an enqueue wait event for the TX enqueue, the first step is to find out who the blocker is and if there are multiple waiters for the same resource by using the following query. If the blocking session is an ad-hoc process, then the user may be taking a break. In this case, ask the user to commit or roll back the transaction. If the blocking session is a batch or OLTP application process, then check to see if the session is still “alive.” It may be a live Oracle session, but its parent process may be dead or hung. In this case, chances are you will have to kill the session to release the locks. Be sure to confirm with the application before killing a production process.

Drilling the issue :

SQL>
select /*+ ordered */
a.sid "Blocker_SID",a.username "UserName",a.serial#,a.logon_time,
b.type,b.lmode "Mode_Held",b.ctime "Time_Held",c.sid "Waiter_SID",c.request,c.ctime "Time_Waited"
from v$lock b,v$enqueue_lock c, v$session a
where a.sid=b.sid
and b.id1=c.id1(+)
and b.id2=c.id2(+)
and c.type(+)='TX'
and b.type='TX'
and b.block=1
order by b.ctime,c.ctime;

You can discover the resource that is being competed for. The resource ID is available in the V$LOCK.ID1 column of the DML lock (TM) for that transaction. It is also available in the V$SESSION.ROW_WAIT_OBJ# of the waiting session. The following query retrieves the resource of the TX enqueue wait:


select c.sid "Waiter_SID",a.object_name,a.object_type
from dba_objects a, v$session b, v$session_wait c
where (a.object_id=b.row_wait_obj# OR a.data_object_id=b.row_wait_obj#)
and b.sid=c.sid
and chr(bitand(c.p1,-16777216)/16777215)||chr(bitand(c.p1,16711680)/65535) = 'TX'
and c.event='enquue';

Don’t forget to extract the SQL statement that is executed by the waiting session as well as the SQL statement that is executed by the blocking session. These statements will give you an idea of what the waiting session is trying to do and what the blocking session is doing. They are also important points of reference for the application developers so that they can quickly locate the modules. (By the way, the SQL statement that is currently being executed by the blocking session is not necessarily the statement that holds the lock. The statement that holds the lock might have been run a long time ago.)

4) Enq: TM - index contention :

Waits on enq: TM - contention in Oracle indicate there are unindexed foreign key constraints. In this article, I examine how foreign key constraints relate to this wait event and how to tune for this event.

Recently, I was assisting my company to diagnose sessions waiting on the "enq: TM - contention" event. The blocked sessions were executing simple INSERT statements similar to:

INSERT INTO supplier VALUES (:1, :2, :3);Waits on enq: TM - contention indicate there are unindexed foreign key constraints. Reviewing the SUPPLIER table, we found a foreign key constraint referencing the PRODUCT table that did not have an associated index. This was also confirmed by the Top Objects feature of Ignite for Oracle because all the time was associated with the PRODUCT table. We added the index on the column referencing the PRODUCT table and the problem was solved.

Finding the root cause of the enq: TM - contention wait event

Oracle's locking feature to find the blocking sessions, we found the real culprit. Periodically, as the company reviewed its vendor list, they "cleaned up" the SUPPLIER table several times a week. As a result, rows from the SUPPLIER table were deleted. Those delete statements were then cascading to the PRODUCT table and taking out TM locks on it.

Reproducing a typical problem that leads to this wait : - A demo

This problem has a simple fix, but I wanted to understand more about why this happens. So I reproduced the issue to see what happens under the covers. I first created a subset of the tables from this customer and loaded them with sample data.

CREATE TABLE supplier
( supplier_id number(10) not null,
supplier_name varchar2(50) not null,
contact_name varchar2(50),
CONSTRAINT supplier_pk PRIMARY KEY (supplier_id)
);

INSERT INTO supplier VALUES (1, 'Supplier 1', 'Contact 1');
INSERT INTO supplier VALUES (2, 'Supplier 2', 'Contact 2');
COMMIT;

CREATE TABLE product
( product_id number(10) not null,
product_name varchar2(50) not null,
supplier_id number(10) not null,
CONSTRAINT fk_supplier
FOREIGN KEY (supplier_id)
REFERENCES supplier(supplier_id)
ON DELETE CASCADE );

INSERT INTO product VALUES (1, 'Product 1', 1);
INSERT INTO product VALUES (2, 'Product 2', 1);
INSERT INTO product VALUES (3, 'Product 3', 2);
COMMIT;

I then executed statements similar to what we found at this customer:

User 1: DELETE supplier WHERE supplier_id = 1;
User 2: DELETE supplier WHERE supplier_id = 2;
User 3: INSERT INTO supplier VALUES (5, 'Supplier 5', 'Contact 5');

Similar to the customer's experience, User 1 and User 2 hung waiting on "enq: TM - contention". Reviewing information from V$SESSION I found the following:


SELECT l.sid, s.blocking_session blocker, s.event, l.type, l.lmode, l.request, o.object_name, o.object_type
FROM v$lock l, dba_objects o, v$session s
WHERE UPPER(s.username) = UPPER('&User')
AND l.id1 = o.object_id (+)
AND l.sid = s.sid
ORDER BY sid, type;



Following along with the solution we used for our customer, we added an index for the foreign key constraint on the SUPPLIER table back to the PRODUCT table:

CREATE INDEX fk_supplier ON product (supplier_id);

When we ran the test case again everything worked fine. There were no exclusive locks acquired and hence no hanging. Oracle takes out exclusive locks on the child table, the PRODUCT table in our example, when a foreign key constraint is not indexed.

Sample query to find unindexed foreign key constraints
Now that we know unindexed foreign key constraints can cause severe problems, here is a script that I use to find them for a specific user (this can easily be tailored to search all schemas):

SELECT *
  FROM (SELECT c.table_name, cc.column_name, cc.position column_position
          FROM user_constraints c, user_cons_columns cc
         WHERE c.constraint_name = cc.constraint_name
           AND c.constraint_type = 'R'
        MINUS
        SELECT i.table_name, ic.column_name, ic.column_position
          FROM user_indexes i, user_ind_columns ic
         WHERE i.index_name = ic.index_name)
 /*where table_name not like '%$%'*/ –- use to find application objects
 ORDER BY table_name, column_position;


5) Row Cache Lock Wait :

In order for DDL (Data Definition Language) to execute, it must acquire a row cache lock to lock the Data Dictionary information. The shared pool contains a cache of rows from the data dictionary that helps reduce physical I/O to the data dictionary tables and allows locking of individual data dictionary rows.

A closer look at the row cache lock wait event:

Each row cache lock will be on a specific data dictionary object. This is called the enqueue type and can be found in the v$rowcache view. In this sample select from v$rowcache you can find the enqueue types and the type of activity being performed within the dictionary cache.

PARAMETER COUNT GETS GETMISSES MODIFICATIONS
--------------------- ----- ------------ ---------- -------------
dc_free_extents 0 0 0 0
dc_used_extents 0 0 0 0
dc_segments 5927 131379921 4142831 693734
dc_tablespaces 22 188609668 2436 0
dc_tablespace_quotas 12 22779303 3843 0
dc_files 0 165961 22493 21
dc_users 19 145681559 2078 21
dc_rollback_segments 67 3906307 66 232
dc_objects 1927 70725250 2247804 74803
dc_sequences 4 142714 1599 142714

Common row cache enqueue lock types:

The tuning of the row cache lock wait is dependent upon the activity for each of the enqueue types. Of these, the most common are:

•DC_SEQUENCES:  this row cache lock wait may occur during the use of sequences. Tune by checking sequences to see if they have the cache option specified and if that cache value is reflective of the anticipated simultaneous inserts by the application.
•DC_USED_EXTENTS and DC_FREE_EXTENTS: this row cache lock wait may occur during space management operations where tablespaces are fragmented or have inadequate extent sizes. Tune by checking whether tablespaces are fragmented, extent sizes are too small, or tablespaces are managed manually.
•DC_TABLESPACES: this row cache lock wait may occur during the allocation of new extents. If extent sizes are set too low the application may frequently request new extents which could cause contention. Tune by checking for rapidly increasing number of extents.
•DC_OBJECTS: this row cache lock wait may occur during the recompilation of objects. If object compiles are occurring this can require an exclusive lock which will block other activity. Tune by examining invalid objects and dependencies.

Tuning for the row cache lock wait event:

The row cache lock wait event is associated with a specific enqueue type on a data dictionary row. Checking activity within the V$ROWCACHE view is a good place to start for understanding this relationship, as tuning can only be accomplished with analysis of the enqueue type.

If a trace file is available you may also see the following error:

 >> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! <<

Also realize that the row cache lock wait event may appear more frequently when using RAC. This is because the library cache and the row cache are global in RAC—causing the row cache lock wait to be more pronounced.

6) Read by Other Session :

When a session waits on the "read by other session" event, it indicates a wait for another session to read the data from disk into the Oracle buffer cache. If this happens too often the performance of the query or the entire database can suffer. Typically this is caused by contention for "hot" blocks or objects so it is imperative to find out which data is being contended for. Once that is known, there are several alternative methods for solving the issue.

When information is requested from the database, Oracle will first read the data from disk into the database buffer cache. If two or more sessions request the same information, the first session will read the data into the buffer cache while other sessions wait. In previous versions this wait was classified under the "buffer busy waits" event. However, in Oracle 10.1 and higher this wait time is now broken out into the "read by other session" wait event. Excessive waits for this event are typically due to several processes repeatedly reading the same blocks, e.g. many sessions scanning the same index or performing full table scans on the same table. Tuning this issue is a matter of finding and eliminating this contention.

Finding the contentions :

When a session is waiting on the "read by other session" event, an entry will be seen in the v$session_wait system view, which will give more information on the blocks being waited for:

SELECT p1 "file#", p2 "block#", p3 "class#"
FROM v$session_wait
WHERE event = 'read by other session';

If information collected from the above query repeatedly shows that the same block (or range of blocks) is experiencing waits, this indicates a "hot" block or object. The following query will give the name and type of the object:

SELECT relative_fno, owner, segment_name, segment_type
 FROM dba_extents
 WHERE file_id = &file
 AND &block BETWEEN block_id AND block_id + blocks - 1;

Eliminating contentions:
Depending on the Oracle database environment and specific performance situation the following variety of methods can be used to eliminate contention:

Tune inefficient queries - This is one of those events you need to "catch in the act" through the v$session_wait view as prescribed above. Then, since this is a disk operating system issue, take the associated system process identifier (c.spid) and see what information you can obtain from the operating system.

Redistribute data from the hot blocks - Deleting and reinserting the hot rows will often move them to a new data block. This will help decrease contention for the hot block and increase performance. More information about the data residing within the hot blocks can be retrieved with queries similar to the following:

SELECT data_object_id
 FROM dba_objects
 WHERE owner='&owner' AND object_name='&object';

SELECT dbms_rowid.rowid_create(1,<data_object_id>,<relative_fno>,<block>,0) start_rowid
 FROM dual;

--rowid for the first row in the block

 SELECT dbms_rowid.rowid_create(1,<data_object_id>,<relative_fno>,<block>,500) end_rowid FROM dual;

--rowid for the 500th row in the block

 SELECT <column_list>
 FROM <owner>.<segment_name>
 WHERE rowid BETWEEN <start_rowid> AND <end_rowid>


Translate >>