May 26, 2015

Instance terminated by PMON / Serial Transaction recovery caught exception 30319 : an workaround

Recently I found below  error from alert log during an workaround.

Tue May 26 15:52:35 2015
Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0x10979BB8] [PC:0x101ECBAF4, opiexe()+28884] [flags: 0x0, count: 1]
Errors in file /u01/app/oracle/diag/rdbms/PROD/PROD/trace/PROD_smon_9306262.trc  (incident=112111):
ORA-07445: exception encountered: core dump [opiexe()+28884] [SIGSEGV] [ADDR:0x10979BB8] [PC:0x101ECBAF4] [Address not mapped to object] []
Incident details in: /u01/app/oracle/diag/rdbms/PROD/PROD/incident/incdir_112111/PROD_smon_9306262_i112111.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Tue May 26 15:52:38 2015
Dumping diagnostic data in directory=[cdmp_20150526155238], requested by (instance=1, osid=9306262 (SMON)), summary=[incident=112111].
Tue May 26 15:52:40 2015
Sweep [inc][112111]: completed
Sweep [inc2][112111]: completed
Tue May 26 15:52:41 2015
PMON (ospid: 16449862): terminating the instance due to error 474
System state dump requested by (instance=1, osid=16449862 (PMON)), summary=[abnormal instance termination].
System State dumped to trace file /u01/app/oracle/diag/rdbms/PROD/PROD/trace/PROD_diag_10223776.trc
Dumping diagnostic data in directory=[cdmp_20150526155242], requested by (instance=1, osid=16449862 (PMON)), summary=[abnormal instance termination].
Instance terminated by PMON, pid = 16449862


From the related .trc file I found below issues:

......
......
ktsmgtur(): TUR was not tuned for 3761 secs
ktsmg_advance_slot(): MMNL advances slot after 3761 seconds
Serial Transaction recovery caught exception 30319

*** 2015-05-26 12:51:32.986
Serial Transaction recovery caught exception 30319
Serial Transaction recovery caught exception 30319

*** 2015-05-26 13:03:05.122
Serial Transaction recovery caught exception 30319



From the above investigations and  I changed the below described parameter. Now no issues are coming.

SQL> show parameter fast_start_parallel_rollback;

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
fast_start_parallel_rollback         string      LOW


SQL> ALTER SYSTEM SET fast_start_parallel_rollback='FALSE';

system altered.

Reason: From alert log I found more parallel recoveries. As it is a report server POC, I changed "fast_start_parallel_rollback='FALSE' ".

If any issue / cause I will find, I will update the same topic.

Note : Instead of running this as “TRUE” or “LOW” for parallel rollback, setting it to “FALSE” is the best choice for a large, complicated transaction, (as there are other transactional processes updating the tables involved in this rollback!)

If SMON is doing some crash recovery, then use the below query and calculate your time:

sql> alter session set NLS_DATE_FORMAT=’DD-MON-YYYY HH24:MI:SS';

sql> select usn,
       state,
       undoblockstotal "Total",
       undoblocksdone "Done",
       undoblockstotal - undoblocksdone "ToDo",
       decode(cputime,
              0,
              'unknown',
              sysdate + (((undoblockstotal - undoblocksdone) /
              (undoblocksdone / cputime)) / 86400)) "Estimated time to complete"
              from v$fast_start_transactions;


When a large transaction got killed, terminated we all know that smon will do the rollback of the transaction, “smon: enable tx recovery”  which can evident in alert log.



No comments:

Post a Comment

Translate >>