In one of my environment, I found "ORA-16857: standby disconnected from redo source for longer than specified threshold" error and for this stand by database replication / sync stopped.
During verification, I found many issues and fixed all one by one.
The issues are:
1) There redo transport user configured but found missing in primary.
2) Redo Transport service(LNS) thrown error
3) MRP process stocked with one archive log which was missing in standby archivelog dest.
So. initially when I verified using broker, below message found. It shown physical standby database is out of sync last ~3 days.
and redo transport service also stopped.
[oracle@standbyhost1a ~]$ dgmgrl
DGMGRL for Linux: Version 12.1.0.2.0 - 64bit Production
Copyright (c) 2000, 2013, Oracle. All rights reserved.
Welcome to DGMGRL, type "help" for information.
DGMGRL> connect sys
Password:
Connected as SYSDG.
DGMGRL> show database hrsdrc;
Database - hrsdrc
Enterprise Manager Name: hrsdrc_1
Role: PHYSICAL STANDBY
Intended State: APPLY-ON
Transport Lag: 2 days 17 hours 38 minutes 28 seconds (computed 86 seconds ago)
Apply Lag: 2 days 17 hours 38 minutes 28 seconds (computed 86 seconds ago)
Average Apply Rate: 494.00 KByte/s
Real Time Query: OFF
Instance(s):
hrsdrc1 (apply instance)
hrsdrc2
Database Warning(s):
ORA-16857: standby disconnected from redo source for longer than specified threshold
Database Status:
WARNING
DGMGRL>
DGMGRL> show database hrprod;
Database - hrprod
Enterprise Manager Name: hrprod1
Role: PRIMARY
Intended State: TRANSPORT-ON
Instance(s):
hrprod1
Error: ORA-16737: the redo transport service for standby database "hrsdrc" has an error
hrprod2
Error: ORA-16737: the redo transport service for standby database "hrsdrc" has an error
Database Status:
ERROR
DGMGRL>
-- Verified log sequence status received and applied in standby database
SQL> SELECT ARCH.THREAD# "Thread", ARCH.SEQUENCE# "Last Sequence Received", APPL.SEQUENCE# "Last Sequence Applied",
(ARCH.SEQUENCE# - APPL.SEQUENCE#) "Difference"
FROM
(SELECT THREAD# ,SEQUENCE# FROM V$ARCHIVED_LOG WHERE (THREAD#,FIRST_TIME ) IN (SELECT THREAD#,MAX(FIRST_TIME) FROM V$ARCHIVED_LOG GROUP BY THREAD#)) ARCH,
(SELECT THREAD# ,SEQUENCE# FROM V$LOG_HISTORY WHERE (THREAD#,FIRST_TIME ) IN (SELECT THREAD#,MAX(FIRST_TIME) FROM V$LOG_HISTORY GROUP BY THREAD#)) APPL
WHERE ARCH.THREAD# = APPL.THREAD#
ORDER BY 1;
Thread Last Sequence Received Last Sequence Applied Difference
---------- ---------------------- --------------------- ----------
1 1695 1634 61
2 1648 1581 67
-- Verified log sequence status in primary database
SQL> select THREAD# ,max(SEQUENCE#) FROM V$ARCHIVED_LOG group by THREAD#;
THREAD# MAX(SEQUENCE#)
---------- --------------
1 1728
2 1665
===============================
Root cause analysis:
===============================
-- Primary database
Issue-1:
======
Verify what is the user configured for redo transport service. Is it default 'SYS' or redo_transport_user.
-- Primary
SQL> show parameter redo;
NAME TYPE VALUE
------------------------------------ -------------------------------------------- ------------------------------
redo_transport_user string
SQL>
-- standby
SQL> show parameter redo;
NAME TYPE VALUE
------------------------------------ -------------------------------------------- ------------------------------
redo_transport_user string REDO_TRANSPORT_USER
SQL>
i.e., Here standby database is configyred with "REDO_TRANSPORT_USER" user but not in primary. Either some changed or wrongly configured.
So it needs to be fixed.
-- Fixes applied in primary
SQL> select name from v$database;
NAME
---------
HRPROD
SQL> alter system set redo_transport_user = REDO_TRANSPORT_USER scope=both sid='*';
System altered.
Issue-2:
Now we need to find what error recorded for redo transport service.
SQL> select name from v$database;
NAME
---------
HRPROD
SQL> SELECT gvi.thread#, timestamp, message
FROM gv$dataguard_status gvds, gv$instance gvi
WHERE gvds.inst_id = gvi.inst_id AND severity in ('Error','Fatal')
ORDER BY timestamp, thread#;
THREAD# TIMESTAMP MESSAGE
---------- ------------------ ------------------------------------------------------------------------------------------
1 02-JAN-20 Error 1033 received logging on to the standby
2 02-JAN-20 Error 1033 received logging on to the standby
2 02-JAN-20 PING[ARC0]: Heartbeat failed to connect to standby 'hrsdrc'. Error is 1033.
2 02-JAN-20 Error 1033 received logging on to the standby
2 02-JAN-20 Error 1033 for archive log file 22 to 'hrsdrc'
1 02-JAN-20 FAL[server, ARC3]: Error 1033 creating remote archivelog file 'hrsdrc'
1 02-JAN-20 Error 1033 received logging on to the standby
2 02-JAN-20 Error 1033 received logging on to the standby
2 02-JAN-20 FAL[server, ARC3]: Error 1033 creating remote archivelog file 'hrsdrc'
1 02-JAN-20 Error 1033 received logging on to the standby
2 02-JAN-20 Error 1033 received logging on to the standby
The above error clearly showing, primary LNS service not able to login to transport redo. So there is password file issue in standby database.
So this password issue needs to fixed. In stand-alone database environment it is easy and fast to fix. But in RAC environment you need to work little-bit more.
And also verify the error message from gv$archive_dest to find redo transport has error or 'VALID'
-- prod
SQL> select inst_id,status,error from gv$archive_dest where dest_id=2;
INST_ID STATUS ERROR
---------- -------- -----------------------------------------
2 ERROR ORA-01033: ORACLE initialization or shutdown in progress
1 ERROR ORA-01033: ORACLE initialization or shutdown in progress
Now from above message, we confirmed, redo transport stopped with error. That to because of password file issue.
Fix/ Solution:
-- Primary : Copy the password file from ASM to file system ( here my password file is in ASM)
ASMCMD [+DATA/hrprod/PASSWORD] > pwcopy +DATA/hrprod/PASSWORD/pwdhrprod.1234.1020304050 /tmp
copying +DATA/hrprod/PASSWORD/pwdhrprod.1234.1020304050 -> /tmp/pwdhrprod.1234.1020304050
ASMCMD [+DATA/hrprod/PASSWORD] > exit
[oracle@hrprodhost1a tmp]$ ls pwdhrprod.1234.1020304050
pwdhrprod.1234.1020304050
-- in standby :
pwdhrsdrc.1234.1020304050
pwcopy /tmp/pwdhrsdrc.1234.1020304050 +DATA/hrsdrc/PASSWORD/pwdhrsdrc.1234.1020304050
ASMCMD [+DATA/hrsdrc/PASSWORD] > pwcopy /tmp/pwdhrsdrc.1234.1020304050 +DATA/hrsdrc/PASSWORD/pwdhrsdrcdg
copying /tmp/pwdhrsdrc.1234.1020304050 -> +DATA/hrsdrc/PASSWORD/pwdhrsdrcdg
ASMCMD [+DATA/hrsdrc/PASSWORD] > ls
pwdhrsdrcdg
srvctl modify database -d hrsdrc -pwfile +DATA/hrsdrc/PASSWORD/pwdhrsdrcdg
sqlplus redo_transport_user/"ChooY+o6"@hrsdrc1 as sysoper
sqlplus redo_transport_user/"ChooY+o6"@hrsdrc2 as sysoper
sqlplus redo_transport_user/"ChooY+o6"@hrsdrc as sysoper
[oracle@hrprodhost1a tmp]$ cksum pwdhrprod.1234.1020304050
54382740 17920 pwdhrprod.1234.1020304050
scp -p oracle@hrprodhost1a:/tmp/pwdhrprod.1234.1020304050 /u01/app/oracle/product/12.1.0.2/dbhome_1/dbs/orapwhrsdrc1
[oracle@standbyhost1a dbs]$ cksum orapwhrsdrc1
2303689170 17920 orapwhrsdrc1
-- Transfer the password file to standby database server and rename it.
$ scp -p oracle@hrprodhost1a:/tmp/pwdhrprod.1234.1020304050 /u01/app/oracle/product/12.1.0.2/dbhome_1/dbs/orapwhrsdrc2
Password:
pwdhrprod.1234.1020304050 100% 18KB 36.4KB/s 00:00
-- next steps to do
ASMCMD [+DATA/HRDRC] > pwd
+DATA/hrsdrc
amcmd> mkdir PASSWORD
ASMCMD [+DATA/HRDRC/PASSWORD] > pwcopy /u01/app/oracle/product/12.1.0.2/dbhome_1/dbs/orapwhrsdrc1 +DATA/HRDRC/PASSWORD/orapwhrsdrc
copying /u01/app/oracle/product/12.1.0.2/dbhome_1/dbs/orapwhrsdrc1 -> +DATA/hrsdrc/PASSWORD/orapwhrsdrc
ASMCMD [+DATA/hrsdrc/PASSWORD] > ls
orapwhrsdrc
After password keeping in standby database ASM location, then configure in database configuration using 'srvctl'
-- Standby
$ srvctl config database -d hrsdrc
Database unique name: hrsdrc
Database name: hrprod
Oracle home: /u01/app/oracle/product/12.1.0.2/dbhome_1
Oracle user: oracle
Spfile:
Password file:
Domain:
Start options: mount
Stop options: immediate
Database role: PHYSICAL_STANDBY
Management policy: AUTOMATIC
Server pools:
Disk Groups: DATA
Mount point paths:
Services:
Type: RAC
Start concurrency:
Stop concurrency:
OSDBA group: dba
OSOPER group: dba
Database instances: hrsdrc1,hrsdrc2
Configured nodes: standbyhost1a,standbyhost1b
Database is administrator managed
[oracle@standbyhost1a ~]$
[oracle@standbyhost1a ~]$ srvctl modify database -d hrsdrc -pwfile +DATA/hrsdrc/PASSWORD/orapwhrsdrc
srvctl config database -d hrsdrc
[oracle@standbyhost1a ~]$ srvctl config database -d hrsdrc
Database unique name: hrsdrc
Database name: hrprod
Oracle home: /u01/app/oracle/product/12.1.0.2/dbhome_1
Oracle user: oracle
Spfile:
Password file: +DATA/hrsdrc/PASSWORD/orapwhrsdrc
Domain:
Start options: mount
Stop options: immediate
Database role: PHYSICAL_STANDBY
Management policy: AUTOMATIC
Server pools:
Disk Groups: DATA
Mount point paths:
Services:
Type: RAC
Start concurrency:
Stop concurrency:
OSDBA group: dba
OSOPER group: dba
Database instances: hrsdrc1,hrsdrc2
Configured nodes: standbyhost1a,standbyhost1b
Database is administrator managed
[oracle@standbyhost1a ~]$
Now password file issue fixed for issue-2. Here you can reload 'REDO TRANSPORT' service.
-- Do the following in Primary database
SQL> show parameter log_archive_dest_state_2;
NAME TYPE VALUE
-------------------------- ---------- --------------------------
log_archive_dest_state_2 string ENABLE
Restart redo transport on the primary:
SQL> alter system set log_archive_dest_state_2=defer sid='*';
System altered.
SQL> alter system set log_archive_dest_state_2=enable sid='*';
System altered.
SQL> alter system archive log current;
System altered.
After this, my redo transport service started.
-- verify again gv$archive_dest.
SQL> select inst_id,status,error from gv$archive_dest where dest_id=2;
INST_ID STATUS ERROR
---------- --------------------- -----------------------------------
2 VALID
1 VALID
Now no error revived. redo transport started.
You can now verify with log sequence.
-- When issue was there
Thread Last Sequence Received Last Sequence Applied Difference
---------- ---------------------- --------------------- ----------
1 1695 1634 61
2 1648 1581 67
-- After password file fix:
Thread Last Sequence Received Last Sequence Applied Difference
---------- ---------------------- --------------------- ----------
1 1733 1634 95
2 1672 1581 67
Now check what is current log sequence values in each thread:
SQL> select name from v$database;
NAME
---------
HRPROD
SQL> select THREAD# ,max(SEQUENCE#) FROM V$ARCHIVED_LOG group by THREAD#;
THREAD# MAX(SEQUENCE#)
---------- --------------
1 1734
2 1673
Here we can see now, all redo logs are shipped to standby but not applying.
Issue-3
=========
Now I verified, why MRP is not applying. Initially I tried below commands to restart MRP. But issue remains same.
SQL> alter database recover managed standby database cancel;
Database altered.
SQL> alter database recover managed standby database using current logfile disconnect from session;
Database altered.
Then I verified what is error for MRP process and why it is not applying.
-- standby
SQL> select inst_id,thread#,sequence#,status from gv$managed_standby where process='MRP0';
INST_ID THREAD# SEQUENCE# STATUS
---------- ---------- ---------- -----------------------------------
1 1 1635 WAIT_FOR_GAP
So MRP is waiting for log sequence '1635'. Then I verified archive log dest location. But I found this archive log is not tranfered or missing by any issue.
Then immediatly I transfered from primary to standby location and registered the redo using below commands.
-- primary
RMAN>copy archivelog '+DATA/hrprod/ARCHIVELOG/thread_1_seq_1635.1234.1028326277' to '/tmp/thread_1_seq_1635.1234.1028326277';
Then, transfer the archive log to standby database.
Register manually by mentioning the copied file location.
On standby,
SQL> alter database register logfile '/tmp/thread_1_seq_1635.1234.1028326277';
Post register, sync started. Now issue is resolved. You can see the final message from dataguard broker.
-- standby
DGMGRL> show database hrsdrc;
Database - grmdrc
Enterprise Manager Name: hrsdrc_1
Role: PHYSICAL STANDBY
Intended State: APPLY-ON
Transport Lag: 0 seconds (computed 0 seconds ago)
Apply Lag: 8 seconds (computed 0 seconds ago)
Average Apply Rate: 183.00 KByte/s
Real Time Query: OFF
Instance(s):
hrsdrc1 (apply instance)
hrsdrc2
Database Status:
SUCCESS
DGMGRL>
During verification, I found many issues and fixed all one by one.
The issues are:
1) There redo transport user configured but found missing in primary.
2) Redo Transport service(LNS) thrown error
3) MRP process stocked with one archive log which was missing in standby archivelog dest.
So. initially when I verified using broker, below message found. It shown physical standby database is out of sync last ~3 days.
and redo transport service also stopped.
[oracle@standbyhost1a ~]$ dgmgrl
DGMGRL for Linux: Version 12.1.0.2.0 - 64bit Production
Copyright (c) 2000, 2013, Oracle. All rights reserved.
Welcome to DGMGRL, type "help" for information.
DGMGRL> connect sys
Password:
Connected as SYSDG.
DGMGRL> show database hrsdrc;
Database - hrsdrc
Enterprise Manager Name: hrsdrc_1
Role: PHYSICAL STANDBY
Intended State: APPLY-ON
Transport Lag: 2 days 17 hours 38 minutes 28 seconds (computed 86 seconds ago)
Apply Lag: 2 days 17 hours 38 minutes 28 seconds (computed 86 seconds ago)
Average Apply Rate: 494.00 KByte/s
Real Time Query: OFF
Instance(s):
hrsdrc1 (apply instance)
hrsdrc2
Database Warning(s):
ORA-16857: standby disconnected from redo source for longer than specified threshold
Database Status:
WARNING
DGMGRL>
DGMGRL> show database hrprod;
Database - hrprod
Enterprise Manager Name: hrprod1
Role: PRIMARY
Intended State: TRANSPORT-ON
Instance(s):
hrprod1
Error: ORA-16737: the redo transport service for standby database "hrsdrc" has an error
hrprod2
Error: ORA-16737: the redo transport service for standby database "hrsdrc" has an error
Database Status:
ERROR
DGMGRL>
-- Verified log sequence status received and applied in standby database
SQL> SELECT ARCH.THREAD# "Thread", ARCH.SEQUENCE# "Last Sequence Received", APPL.SEQUENCE# "Last Sequence Applied",
(ARCH.SEQUENCE# - APPL.SEQUENCE#) "Difference"
FROM
(SELECT THREAD# ,SEQUENCE# FROM V$ARCHIVED_LOG WHERE (THREAD#,FIRST_TIME ) IN (SELECT THREAD#,MAX(FIRST_TIME) FROM V$ARCHIVED_LOG GROUP BY THREAD#)) ARCH,
(SELECT THREAD# ,SEQUENCE# FROM V$LOG_HISTORY WHERE (THREAD#,FIRST_TIME ) IN (SELECT THREAD#,MAX(FIRST_TIME) FROM V$LOG_HISTORY GROUP BY THREAD#)) APPL
WHERE ARCH.THREAD# = APPL.THREAD#
ORDER BY 1;
Thread Last Sequence Received Last Sequence Applied Difference
---------- ---------------------- --------------------- ----------
1 1695 1634 61
2 1648 1581 67
-- Verified log sequence status in primary database
SQL> select THREAD# ,max(SEQUENCE#) FROM V$ARCHIVED_LOG group by THREAD#;
THREAD# MAX(SEQUENCE#)
---------- --------------
1 1728
2 1665
===============================
Root cause analysis:
===============================
-- Primary database
Issue-1:
======
Verify what is the user configured for redo transport service. Is it default 'SYS' or redo_transport_user.
-- Primary
SQL> show parameter redo;
NAME TYPE VALUE
------------------------------------ -------------------------------------------- ------------------------------
redo_transport_user string
SQL>
-- standby
SQL> show parameter redo;
NAME TYPE VALUE
------------------------------------ -------------------------------------------- ------------------------------
redo_transport_user string REDO_TRANSPORT_USER
SQL>
i.e., Here standby database is configyred with "REDO_TRANSPORT_USER" user but not in primary. Either some changed or wrongly configured.
So it needs to be fixed.
-- Fixes applied in primary
SQL> select name from v$database;
NAME
---------
HRPROD
SQL> alter system set redo_transport_user = REDO_TRANSPORT_USER scope=both sid='*';
System altered.
Issue-2:
Now we need to find what error recorded for redo transport service.
SQL> select name from v$database;
NAME
---------
HRPROD
SQL> SELECT gvi.thread#, timestamp, message
FROM gv$dataguard_status gvds, gv$instance gvi
WHERE gvds.inst_id = gvi.inst_id AND severity in ('Error','Fatal')
ORDER BY timestamp, thread#;
THREAD# TIMESTAMP MESSAGE
---------- ------------------ ------------------------------------------------------------------------------------------
1 02-JAN-20 Error 1033 received logging on to the standby
2 02-JAN-20 Error 1033 received logging on to the standby
2 02-JAN-20 PING[ARC0]: Heartbeat failed to connect to standby 'hrsdrc'. Error is 1033.
2 02-JAN-20 Error 1033 received logging on to the standby
2 02-JAN-20 Error 1033 for archive log file 22 to 'hrsdrc'
1 02-JAN-20 FAL[server, ARC3]: Error 1033 creating remote archivelog file 'hrsdrc'
1 02-JAN-20 Error 1033 received logging on to the standby
2 02-JAN-20 Error 1033 received logging on to the standby
2 02-JAN-20 FAL[server, ARC3]: Error 1033 creating remote archivelog file 'hrsdrc'
1 02-JAN-20 Error 1033 received logging on to the standby
2 02-JAN-20 Error 1033 received logging on to the standby
The above error clearly showing, primary LNS service not able to login to transport redo. So there is password file issue in standby database.
So this password issue needs to fixed. In stand-alone database environment it is easy and fast to fix. But in RAC environment you need to work little-bit more.
And also verify the error message from gv$archive_dest to find redo transport has error or 'VALID'
-- prod
SQL> select inst_id,status,error from gv$archive_dest where dest_id=2;
INST_ID STATUS ERROR
---------- -------- -----------------------------------------
2 ERROR ORA-01033: ORACLE initialization or shutdown in progress
1 ERROR ORA-01033: ORACLE initialization or shutdown in progress
Now from above message, we confirmed, redo transport stopped with error. That to because of password file issue.
Fix/ Solution:
-- Primary : Copy the password file from ASM to file system ( here my password file is in ASM)
ASMCMD [+DATA/hrprod/PASSWORD] > pwcopy +DATA/hrprod/PASSWORD/pwdhrprod.1234.1020304050 /tmp
copying +DATA/hrprod/PASSWORD/pwdhrprod.1234.1020304050 -> /tmp/pwdhrprod.1234.1020304050
ASMCMD [+DATA/hrprod/PASSWORD] > exit
[oracle@hrprodhost1a tmp]$ ls pwdhrprod.1234.1020304050
pwdhrprod.1234.1020304050
-- in standby :
pwdhrsdrc.1234.1020304050
pwcopy /tmp/pwdhrsdrc.1234.1020304050 +DATA/hrsdrc/PASSWORD/pwdhrsdrc.1234.1020304050
ASMCMD [+DATA/hrsdrc/PASSWORD] > pwcopy /tmp/pwdhrsdrc.1234.1020304050 +DATA/hrsdrc/PASSWORD/pwdhrsdrcdg
copying /tmp/pwdhrsdrc.1234.1020304050 -> +DATA/hrsdrc/PASSWORD/pwdhrsdrcdg
ASMCMD [+DATA/hrsdrc/PASSWORD] > ls
pwdhrsdrcdg
srvctl modify database -d hrsdrc -pwfile +DATA/hrsdrc/PASSWORD/pwdhrsdrcdg
sqlplus redo_transport_user/"ChooY+o6"@hrsdrc1 as sysoper
sqlplus redo_transport_user/"ChooY+o6"@hrsdrc2 as sysoper
sqlplus redo_transport_user/"ChooY+o6"@hrsdrc as sysoper
[oracle@hrprodhost1a tmp]$ cksum pwdhrprod.1234.1020304050
54382740 17920 pwdhrprod.1234.1020304050
scp -p oracle@hrprodhost1a:/tmp/pwdhrprod.1234.1020304050 /u01/app/oracle/product/12.1.0.2/dbhome_1/dbs/orapwhrsdrc1
[oracle@standbyhost1a dbs]$ cksum orapwhrsdrc1
2303689170 17920 orapwhrsdrc1
-- Transfer the password file to standby database server and rename it.
$ scp -p oracle@hrprodhost1a:/tmp/pwdhrprod.1234.1020304050 /u01/app/oracle/product/12.1.0.2/dbhome_1/dbs/orapwhrsdrc2
Password:
pwdhrprod.1234.1020304050 100% 18KB 36.4KB/s 00:00
-- next steps to do
ASMCMD [+DATA/HRDRC] > pwd
+DATA/hrsdrc
amcmd> mkdir PASSWORD
ASMCMD [+DATA/HRDRC/PASSWORD] > pwcopy /u01/app/oracle/product/12.1.0.2/dbhome_1/dbs/orapwhrsdrc1 +DATA/HRDRC/PASSWORD/orapwhrsdrc
copying /u01/app/oracle/product/12.1.0.2/dbhome_1/dbs/orapwhrsdrc1 -> +DATA/hrsdrc/PASSWORD/orapwhrsdrc
ASMCMD [+DATA/hrsdrc/PASSWORD] > ls
orapwhrsdrc
After password keeping in standby database ASM location, then configure in database configuration using 'srvctl'
-- Standby
$ srvctl config database -d hrsdrc
Database unique name: hrsdrc
Database name: hrprod
Oracle home: /u01/app/oracle/product/12.1.0.2/dbhome_1
Oracle user: oracle
Spfile:
Password file:
Domain:
Start options: mount
Stop options: immediate
Database role: PHYSICAL_STANDBY
Management policy: AUTOMATIC
Server pools:
Disk Groups: DATA
Mount point paths:
Services:
Type: RAC
Start concurrency:
Stop concurrency:
OSDBA group: dba
OSOPER group: dba
Database instances: hrsdrc1,hrsdrc2
Configured nodes: standbyhost1a,standbyhost1b
Database is administrator managed
[oracle@standbyhost1a ~]$
[oracle@standbyhost1a ~]$ srvctl modify database -d hrsdrc -pwfile +DATA/hrsdrc/PASSWORD/orapwhrsdrc
srvctl config database -d hrsdrc
[oracle@standbyhost1a ~]$ srvctl config database -d hrsdrc
Database unique name: hrsdrc
Database name: hrprod
Oracle home: /u01/app/oracle/product/12.1.0.2/dbhome_1
Oracle user: oracle
Spfile:
Password file: +DATA/hrsdrc/PASSWORD/orapwhrsdrc
Domain:
Start options: mount
Stop options: immediate
Database role: PHYSICAL_STANDBY
Management policy: AUTOMATIC
Server pools:
Disk Groups: DATA
Mount point paths:
Services:
Type: RAC
Start concurrency:
Stop concurrency:
OSDBA group: dba
OSOPER group: dba
Database instances: hrsdrc1,hrsdrc2
Configured nodes: standbyhost1a,standbyhost1b
Database is administrator managed
[oracle@standbyhost1a ~]$
Now password file issue fixed for issue-2. Here you can reload 'REDO TRANSPORT' service.
-- Do the following in Primary database
SQL> show parameter log_archive_dest_state_2;
NAME TYPE VALUE
-------------------------- ---------- --------------------------
log_archive_dest_state_2 string ENABLE
Restart redo transport on the primary:
SQL> alter system set log_archive_dest_state_2=defer sid='*';
System altered.
SQL> alter system set log_archive_dest_state_2=enable sid='*';
System altered.
SQL> alter system archive log current;
System altered.
After this, my redo transport service started.
-- verify again gv$archive_dest.
SQL> select inst_id,status,error from gv$archive_dest where dest_id=2;
INST_ID STATUS ERROR
---------- --------------------- -----------------------------------
2 VALID
1 VALID
Now no error revived. redo transport started.
You can now verify with log sequence.
-- When issue was there
Thread Last Sequence Received Last Sequence Applied Difference
---------- ---------------------- --------------------- ----------
1 1695 1634 61
2 1648 1581 67
-- After password file fix:
Thread Last Sequence Received Last Sequence Applied Difference
---------- ---------------------- --------------------- ----------
1 1733 1634 95
2 1672 1581 67
Now check what is current log sequence values in each thread:
SQL> select name from v$database;
NAME
---------
HRPROD
SQL> select THREAD# ,max(SEQUENCE#) FROM V$ARCHIVED_LOG group by THREAD#;
THREAD# MAX(SEQUENCE#)
---------- --------------
1 1734
2 1673
Here we can see now, all redo logs are shipped to standby but not applying.
Issue-3
=========
Now I verified, why MRP is not applying. Initially I tried below commands to restart MRP. But issue remains same.
SQL> alter database recover managed standby database cancel;
Database altered.
SQL> alter database recover managed standby database using current logfile disconnect from session;
Database altered.
Then I verified what is error for MRP process and why it is not applying.
-- standby
SQL> select inst_id,thread#,sequence#,status from gv$managed_standby where process='MRP0';
INST_ID THREAD# SEQUENCE# STATUS
---------- ---------- ---------- -----------------------------------
1 1 1635 WAIT_FOR_GAP
So MRP is waiting for log sequence '1635'. Then I verified archive log dest location. But I found this archive log is not tranfered or missing by any issue.
Then immediatly I transfered from primary to standby location and registered the redo using below commands.
-- primary
RMAN>copy archivelog '+DATA/hrprod/ARCHIVELOG/thread_1_seq_1635.1234.1028326277' to '/tmp/thread_1_seq_1635.1234.1028326277';
Then, transfer the archive log to standby database.
Register manually by mentioning the copied file location.
On standby,
SQL> alter database register logfile '/tmp/thread_1_seq_1635.1234.1028326277';
Post register, sync started. Now issue is resolved. You can see the final message from dataguard broker.
-- standby
DGMGRL> show database hrsdrc;
Database - grmdrc
Enterprise Manager Name: hrsdrc_1
Role: PHYSICAL STANDBY
Intended State: APPLY-ON
Transport Lag: 0 seconds (computed 0 seconds ago)
Apply Lag: 8 seconds (computed 0 seconds ago)
Average Apply Rate: 183.00 KByte/s
Real Time Query: OFF
Instance(s):
hrsdrc1 (apply instance)
hrsdrc2
Database Status:
SUCCESS
DGMGRL>
Thanks Gouranga Sir for the post.Very Helpful.
ReplyDeleteBut not sure how the password file got effected all of sudden, when system is working find before. Any points on this.