Aug 14, 2015

Oracle RAC service - create | manage | TAF | PRCR-1013 : Failed to start resource - a fix


Create and Manage custom Service in Oracle 11gR2 RAC database database:

We will discuss below points how we will configure and manage custom RAC services to connect database for various purposes:
  • create service
  • start service
  • stop service
  • config service
  • modify service
Note: I assume that you have all basic ideas about RAC services. Let us see what is the output when we have a default database service. Assume I have a database with name PROD. i.e., It has two RAC instances ( for two node RAC)

Along with we will discuss a workaround on below issues at last:

Fix : PRCR-1013 : Failed to start resource - while start service in RAC database

Out put for configuration for default service for my SID :
$ srvctl config database -d PROD

Database unique name: PROD
Database name: PROD
Oracle home: /oracle/oracle_base/product/dbhome_1
Oracle user: oracle
Spfile: +DATA/PROD/spfilePROD.ora
Domain:
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Server pools: PROD
Database instances: PROD1,PROD2
Disk Groups: DATA,FRA
Mount point paths:
Services:
Type: RAC
Database is administrator managed

Deatils of rac setup:
$ crsctl stat res -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS      
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
               ONLINE  ONLINE       rac1                                       
               ONLINE  ONLINE       rac2                                       
ora.FRA.dg
               ONLINE  ONLINE       rac1                                       
               ONLINE  ONLINE       rac2                                       
ora.LISTENER.lsnr
               ONLINE  ONLINE       rac1                                       
               ONLINE  ONLINE       rac2                                       
ora.LISTENER_PROD.lsnr
               ONLINE  ONLINE       rac1                                       
               ONLINE  ONLINE       rac2                                       
ora.QUORUM.dg
               ONLINE  ONLINE       rac1                                       
               ONLINE  ONLINE       rac2                                       
ora.asm
               ONLINE  ONLINE       rac1                                       
               ONLINE  ONLINE       rac2                                       
ora.gsd
               OFFLINE OFFLINE      rac1                                       
               OFFLINE OFFLINE      rac2                                       
ora.net1.network
               ONLINE  ONLINE       rac1                                       
               ONLINE  ONLINE       rac2                                       
ora.ons
               ONLINE  ONLINE       rac1                                       
               ONLINE  ONLINE       rac2                                       
ora.registry.acfs
               ONLINE  ONLINE       rac1                                       
               ONLINE  ONLINE       rac2                                       
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       rac2                                       
ora.PROD.db
      1        ONLINE  ONLINE       rac1                    Open               
      2        ONLINE  ONLINE       rac2                    Open               
ora.cvu
      1        ONLINE  ONLINE       rac1                                       
ora.rac1.vip
      1        ONLINE  ONLINE       rac1                                       
ora.rac2.vip
      1        ONLINE  ONLINE       rac2                                       
ora.oc4j
      1        ONLINE  ONLINE       rac1                                       
ora.scan1.vip
      1        ONLINE  ONLINE       rac2                                       

$

1) Create policy managed services:
syntax:
Adds services to a database and assigns them to instances. If you have multiple instances of a cluster database on the same node, then always use only one instance on that node for all of the services that node manages.
Note:
The srvctl add service command does not accept placement options for Oracle RAC One Node databases.
Syntax and Options
Use the srvctl add service command to create a service, using the first of the following syntax models, or to update an existing service, using the second syntax model:

srvctl add service -d db_unique_name -s service_name {-r "preferred_list"
   [-a "available_list"] [-P {BASIC | NONE | PRECONNECT}] | -g server_pool
   [-c {UNIFORM | SINGLETON]} [-k network_number]
   [-l [PRIMARY | PHYSICAL_STANDBY | LOGICAL_STANDBY | SNAPSHOT_STANDBY]
   [-y {AUTOMATIC | MANUAL}] [-q {TRUE | FALSE}] [-x {TRUE | FALSE}]
   [-j {SHORT | LONG}][-B {NONE | SERVICE_TIME | THROUGHPUT}]
   [-e {NONE | SESSION | SELECT}] [-m {NONE | BASIC}] [-z failover_retries]
   [-w failover_delay]
srvctl add service -d db_unique_name -s service_name
   -u {-r preferred_list | -a available_list} [-f]

Service Management Policy:
When you use Oracle Clusterware to manage your database, you can configure startup options for each individual database service when you add the service using the srvctl add service command with the -y option. If you set the management policy for a service to AUTOMATIC (the default), then the service starts automatically when you start the database with SRVCTL. If you set the management policy to MANUAL, then the service does not automatically start, and you must manually start it with SRVCTL. A MANUAL setting does not prevent Oracle Clusterware from monitoring the service when it is running and restarting it if a failure occurs. Prior to Oracle RAC 11g release 2 (11.2), all services worked as though they were defined with a MANUAL management policy.

Note:
When you use automatic services in an administrator-managed database, during planned database startup, services may start on the first instances to start rather than their preferred instances.

Example from Oracle Doc:
Use this example syntax to add the gl.example.com service to the my_rac database with AQ HA notifications enabled, a failover method of BASIC, a Connection Load Balancing Goal of LONG, a failover type of SELECT, and 180 failover retries with a delay of 5:

$ srvctl add service -d my_rac -s gl.example.com -m BASIC -e SELECT -z 180 -w 5 -j LONG

Use this example syntax to add a named service to a database with preferred instances in list one and available instances in list two, using preconnect failover for the available instances:
srvctl add service -d crm -s sales -r crm01,crm02 -a crm03 -P PRECONNECT

Example:
$ srvctl add service -d db_unique_name -s service_name -r preferred_list [-a available_list] [-P TAF_policy]
$ srvctl add service -d PROD -s hrprod -r PROD1,PROD2 -a PROD2 -m BASIC -e SELECT -z 180 -w 5 -j LONG

Some issues I faced during service creation. See my errors also:

TRY-1:
$ srvctl add service -d PROD -s hrprod -r PROD1,PROD2 -a PROD2 -m BASIC -e SELECT -z 180 -w 5 -j LONG
PRKO-2102 : Failed to create server pool PROD_hrprod: PRCS-1009 : Failed to create server pool ora.PROD_hrprod
PRCR-1071 : Failed to register or update server pool ora.PROD_hrprod
CRS-2714: 'rac2' is specified multiple times in 'SERVER_NAMES' of 'ora.PROD_hrprod'

Try-2:
srvctl add service -d PROD -s hrprod -m BASIC -e SELECT -z 180 -w 5 -j LONG

$ srvctl add service -d PROD -s hrprod -m BASIC -e SELECT -z 180 -w 5 -j LONG
PRKO-3116 : '-g' or '-r' option should be provided

Correct One:
$ srvctl add service -d PROD -s hrprod -r PROD1,PROD2 -m BASIC -e SELECT -z 180 -w 5 -j LONG

outpu: 
crsctl stat res -t

--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       rac2                                       
ora.PROD.hrprod.svc
      1        OFFLINE OFFLINE                                                  
      2        OFFLINE OFFLINE                                                  
ora.PROD.db
      1        ONLINE  ONLINE       rac1                    Open               
      2        ONLINE  ONLINE       rac2                    Open  

2) Start the services:

$ srvctl start service

Starts a service or multiple services on the specified instance. The srvctl start service command will fail if you attempt to start a service on an instance if that service is already running on its maximum number of instances, that is, its number of preferred instances. You may move a service or change the status of a service on an instance with the srvctl modify service and srvctl relocate service commands described later in this appendix.

Note:All manual service startup must specify the name of the service to be started by the user.

Syntax and Options:
Use the srvctl start ser
vice command with the following syntax:

srvctl start service -d db_unique_name
            [-s "service_name_list" [-n node_name | -i instance_name]]
            [-o start_options]

Example:
a) Start service in one node:

$ srvctl start service -d PROD -s hrprod -n rac1 -o open

when service started in one node, then see the output of 'crsctl stat res -t':

--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       rac2                                       
ora.PROD.hrprod.svc
      1        ONLINE  ONLINE       rac1                                       
      2        OFFLINE OFFLINE                                                  
ora.PROD.db
      1        ONLINE  ONLINE       rac1                    Open               
      2        ONLINE  ONLINE       rac2                    Open

Now try to start service in other node:
$ srvctl start service -d PROD -s hrprod -n rac2 -o open

b) Start service in all nodes at a time:

$ srvctl start service -d PROD -s hrprod -o open

see the output of 'crsctl stat res -t':
--------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       rac2                                       
ora.PROD.hrprod.svc
      1        ONLINE  ONLINE       rac1                                       
      2        ONLINE  ONLINE       rac2                                       
ora.PROD.db
      1        ONLINE  ONLINE       rac1                    Open               
      2        ONLINE  ONLINE       rac2                    Open  

3) Stop service:

srvctl stop service
Stops one or more services globally across the cluster database, or on the specified instance.
Syntax and Options
Use the srvctl stop service command with the following syntax:
srvctl stop service -d db_unique_name [-s "service_name_list"
     [-n node_name | -i instance_name] [-f]

a) Stop service in one node/ instance:

$ srvctl stop service -d PROD -s hrprod -n rac1
$ srvctl stop service -d PROD -s hrprod -n rac2

b) Stop service in all nodes/ instances:

$ srvctl stop service -d PROD -s hrprod

4) Config service:

$ srvctl config service -d PROD -s hrprod

$ srvctl config service -d PROD -s hrprod
Service name: hrprod
Service is enabled
Server pool: PROD_hrprod
Cardinality: 2
Disconnect: false
Service role: PRIMARY
Management policy: AUTOMATIC
DTP transaction: false
AQ HA notifications: false
Failover type: SELECT
Failover method: BASIC
TAF failover retries: 180
TAF failover delay: 5
Connection Load Balancing Goal: LONG
Runtime Load Balancing Goal: NONE
TAF policy specification: NONE
Edition:
Preferred instances: PROD1,PROD2
Available instances:
$

When I created in one my other test database:

$ srvctl config service -d proddb -s finprod
Service name: finprod
Service is enabled
Server pool: proddb_finprod
Cardinality: 2
Disconnect: false
Service role: PRIMARY
Management policy: AUTOMATIC
DTP transaction: false
AQ HA notifications: false
Failover type: NONE
Failover method: NONE
TAF failover retries: 0
TAF failover delay: 0
Connection Load Balancing Goal: SHORT
Runtime Load Balancing Goal: SERVICE_TIME
TAF policy specification: NONE
Edition:
Preferred instances: proddb1,proddb2
Available instances:

Best practice: As per oracle suggestion, if no standby, then below configuration is good. As per one of production setting, I used below commands to create service:

$ srvctl add service -d PROD -s hrprod -r PROD1,PROD2 -B SERVICE_TIME -j SHORT
$ srvctl start service -d PROD -s hrprod

$ srvctl config service -d PROD -s hrprod
Service name: hrprod
Service is enabled
Server pool: PROD_hrprod
Cardinality: 2
Disconnect: false
Service role: PRIMARY
Management policy: AUTOMATIC
DTP transaction: false
AQ HA notifications: false
Failover type: NONE
Failover method: NONE
TAF failover retries: 0
TAF failover delay: 0
Connection Load Balancing Goal: SHORT
Runtime Load Balancing Goal: SERVICE_TIME
TAF policy specification: NONE
Edition:
Preferred instances: PROD1,PROD2
Available instances:

5) Remove service:

srvctl remove service  --Removes the configuration for a service.

Syntax and Options
Use the srvctl remove service command as follows:
srvctl remove service -d db_unique_name -s service_name [-i instance_name] [-f]

-f    :  Removes the service resource even though the service is running   

Best practice: First stop the service, then drop it.

$ srvctl remove service -d crm -s sales

The following example removes the services from specific instances:
$ srvctl remove service -d crm -s sales -i crm01,crm02

Example:
$ srvctl remove service -d PROD -s hrprod -f

6) Re-create the services with different TAF policy:

a) We have created earlier like below:

$ srvctl add service -d PROD -s hrprod -r PROD1,PROD2 -m BASIC -e SELECT -z 180 -w 5 -j LONG

b) In below example we are taking care entire SESSION failover instead of SELECT sessions.

srvctl add service -d PROD -s hrprod -r PROD1,PROD1 -e SESSION -m BASIC -w 10 -z 150

c) If culster is primary cluster and you have standby cluster also, then follow below guide lines:

1. On the primary and standby hosts create the service (oltpworkload) that the application will use to connect to the database. The service should be created such that it is associated with and runs on the database when it is in the ‘PRIMARY’ database role:

Primary cluster:

srvctl add service -d Austin -s oltpworkload -r ssa1,ssa2,ssa3,ssa4 -l PRIMARY -q TRUE -e SESSION -m BASIC -w 10 -z 150

Standby cluster:

srvctl add service -d Houston -s oltpworkload -r ssb1,ssb2,ssb3,ssb4 -l PRIMARY -q TRUE -e SESSION -m BASIC -w 10 -z 150

2. If the standby is also going to support read-only reporting applications, then create a service specific for this workload (reports) that will start when the database is in PHYSICAL_STANDBY role.

Primary cluster:

srvctl add service -d Austin -s reports -r ssa1,ssa2,ssa3,ssa4 -l PHYSICAL_STANDBY -q TRUE -e SESSION -m BASIC -w 10 -z 150

Standby cluster: 

srvctl add service -d Houston -s reports -r ssb1,ssb2,ssb3,ssb4 -l PHYSICAL_STANDBY -q TRUE -e SESSION -m BASIC -w 10 -z 150

In addition to creating the database service “reports” on both clusters, the following SQL statement must also be run on the primary database so that the service definition is transmitted via the redo stream and applied to the physical standby database:

SQL run at the Primary database:

EXECUTE DBMS_SERVICE.CREATE_SERVICE('reports', 'reports', NULL, NULL,TRUE, 'BASIC', 'SESSION', 150, 10, NULL);

7) Configure an Oracle Net : primary & standby cluster

Configure an Oracle Net alias that the OCI application will use to connect to the database. The Oracle Net alias should specify both the primary and standby SCAN hostnames. For best performance while creating new connections the Oracle Net alias should have LOAD_BALANCE=OFF for the DESCRIPTION_LIST so that DESCRIPTIONs are tried in an ordered list, top to bottom. With this configuration the second DESCRIPTION is only attempted if all connection attempts to the first DESCRIPTION have failed.

SALES=
(DESCRIPTION_LIST=
(LOAD_BALANCE=off)
(FAILOVER=on)
(DESCRIPTION=
(CONNECT_TIMEOUT=5)(TRANSPORT_CONNECT_TIMEOUT=3)(RETRY_COUNT=3)
  (ADDRESS_LIST=
    (LOAD_BALANCE=on)
    (ADDRESS=(PROTOCOL=TCP)(HOST=Austin-scan)(PORT=1521)))
(CONNECT_DATA=(SERVICE_NAME=oltpworkload)))
(DESCRIPTION=
(CONNECT_TIMEOUT=5)(TRANSPORT_CONNECT_TIMEOUT=3)(RETRY_COUNT=3)
  (ADDRESS_LIST=
    (LOAD_BALANCE=on)
    (ADDRESS=(PROTOCOL=TCP)(HOST= Houston-scan)(PORT=1521)))
(CONNECT_DATA=(SERVICE_NAME=oltpworkload))))

When a new connection is made using the above Oracle Net alias the following logic is used: 
a) Oracle Net contacts DNS and resolves Austin-scan to a total of 3 IP addresses.
b) Oracle Net randomly picks one of the 3 IP address and attempts to make a connection. If the connection attempt to the IP address does not respond in 3 seconds (TRANSPORT_CONNECT_TIMEOUT) the next IP address is attempted. All 3 IP addresses will be tried a total of four times (initial attempt plus RETRY_COUNT in the above example).
c) If the connection to primary site is unsuccessful, it then contacts DNS and resolves Houston-scan to 3 addresses.
d) The same sequence is performed for the standby Houston-scan as it was for the Austin-scan.

When a new connection is made using the above Oracle Net alias the following logic is used: 
a) Oracle Net contacts DNS and resolves Austin-scan to a total of 3 IP addresses.
b) Oracle Net randomly picks one of the 3 IP address and attempts to make a connection. If the connection attempt to the IP address does not respond in 3 seconds (TRANSPORT_CONNECT_TIMEOUT) the next IP address is attempted. All 3 IP addresses will be tried a total of four times (initial attempt plus RETRY_COUNT in the above example).
c) If the connection to primary site is unsuccessful, it then contacts DNS and resolves Houston-scan to 3 addresses.
d) The same sequence is performed for the standby Houston-scan as it was for the Austin-scan.

Additional information on the Oracle Net parameters used in the above alias:

LOAD_BALANCE is ON by default for DESCRIPTION_LIST only. This parameter by default is OFF for an address list within a DESCRIPTION. Setting this ON for a SCAN-based address implies that new connections will be randomly assigned to one of the 3 SCAN-based IP addresses resolved by DNS.
In certain situations, round-robin address assignment by DNS may not be possible - see the Oracle Database 11.2.0.2 Readme. The best practice to ensure connect-time client load balancing across the 3 SCAN IP addresses is to explicitly specify LOAD_BALANCE=on. Note that this behavior is independent of server-side load balancing which will occur subsequently, after the initial SCAN listener receives the connection request.
The default value for the FAILOVER parameter is ON for an address list within a DESCRIPTION. This impacts the 3 SCAN IP addresses the same way as if those 3 IP addresses were listed explicitly in the connect descriptor. This means that if the initial connection requests to the first randomly-assigned SCAN IP address fails, the connection will failover to another SCAN IP address, and will continue to do so, till it iterates the complete address list. Note that this parameter is relevant only to new connections. Failover of existing connections is handled by TAF, which is controlled by the separate FAILOVER_MODE parameter.
The CONNECT_TIMEOUT parameter is the time to connect to the database instance providing the requested service, and includes the time to establish a TCP connection to the listener. The TCP duration is controlled by TRANSPORT_CONNECT_TIMEOUT, which has a default value of 60 seconds. If both timeouts are specified, it is recommended that CONNECT_TIMEOUT be set to a value slightly greater than TRANSPORT_CONNECT_TIMEOUT. The timeout interval is applicable for each ADDRESS in an ADDRESS_LIST, and each IP address to which a host name is mapped. Set the CONNECT_TIMEOUT parameter to the maximum amount of time (in seconds) to wait for a response from an address before skipping to the next address. A setting of three seconds is recommended and is acceptable in most cases. Do not set this parameter to fewer than three seconds.

The equivalent global parameter in sqlnet.ora is SQLNET.OUTBOUND_CONNECT_TIMEOUT. If the same timeout value is sufficient for all connect strings, it would be simpler to set the global parameter. Otherwise, a separate setting can be done for each connect string.

The equivalent global parameter for TRANSPORT_CONNECT_TIMEOUT is TCP.CONNECT_TIMEOUT. Both these parameters are applicable only when the protocol is TCP.
The RETRY_COUNT parameter specifies the number of times an address list is traversed before the new connection attempt is terminated. The default value is 0. With respect to SCAN, with FAILOVER = on, setting this RETRY_COUNT parameter to a value of 2 (for example), means the 3 SCAN IP addresses are traversed thrice (i.e. 3*3=9 connect attempts), before the connection is terminated:

    > When the connection request initially comes in, the first randomly assigned IP address tries to service that request, followed by the two remaining IP addresses (this behavior is controlled by the FAILOVER parameter);

    > The retries then kick in and the list of 3 IP addresses is tried two more times. RETRY_COUNT is only supported at DESCRIPTION level in connect string, but not at global (i.e. sqlnet.ora) level.

Incident/ Issues to start a service:- Test case scenario

When unable to start a service:

Fix : PRCR-1013 : Failed to start resource - while start service in RAC database

Suddenly I found one of the service which is created for load-balancing TAF purpose, is down. When I tried to start it, then below error came.

$ srvctl start service -d proddb -s finprod -n racdb2 -o open
PRCR-1013 : Failed to start resource ora.proddb.finprod.svc
PRCR-1064 : Failed to start resource ora.proddb.finprod.svc on node racdb2
CRS-5017: The resource action "ora.proddb.finprod.svc start" encountered the following error:
ORA-06550: line 1, column 34:
PLS-00553: character set name is not recognized
ORA-06550: line 0, column 0:
PL/SQL: Compilation unit analysis terminated
. For details refer to "(:CLSN00107:)" in "/u01/app/grid4/log/racdb2/agent/crsd/oraagent_oracle/oraagent_oracle.log".

CRS-2674: Start of 'ora.proddb.finprod.svc' on 'racdb2' failed
CRS-5017: The resource action "ora.proddb.finprod.svc clean" encountered the following error:
ORA-06550: line 1, column 33:
PLS-00553: character set name is not recognized
ORA-06550: line 0, column 0:
PL/SQL: Compilation unit analysis terminated
. For details refer to "(:CLSN00106:)" in "/u01/app/grid4/log/racdb2/agent/crsd/oraagent_oracle/oraagent_oracle.log".

Now read the error written in oraagent_oracle.log file. Best way is raise SR and follow it immediately. 

Collect details:

script /tmp/out.1
echo "Took on server `hostname` at `date` as `id`"
crsctl stat res -t
srvctl stop service -d proddb -s finprod
crsctl stat res -t

### make sure the service is stopped at this point
export SRVM_TRACE=true
srvctl start service -d proddb -s finprod
date
crsctl stat res -t
date

#### service should be still down since its not working this way.
srvctl start service -d proddb -s finprod -i proddb2
crsctl stat res -t
date
exit

Workaround:

I did below activities and resolved the issues:

Before any activity, output of "crsctl stat res -t":
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER              
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
               ONLINE  ONLINE       racdb1                                       
               ONLINE  ONLINE       racdb2                                       
ora.FRA.dg
               ONLINE  ONLINE       racdb1                                       
               ONLINE  ONLINE       racdb2                                       
ora.LISTENER.lsnr
               ONLINE  ONLINE       racdb1                                       
               ONLINE  ONLINE       racdb2                                       
ora.asm
               ONLINE  ONLINE       racdb1                    Started            
               ONLINE  ONLINE       racdb2                    Started            
ora.gsd
               OFFLINE OFFLINE      racdb1                                       
               OFFLINE OFFLINE      racdb2                                       
ora.net1.network
               ONLINE  ONLINE       racdb1                                       
               ONLINE  ONLINE       racdb2                                       
ora.ons
               ONLINE  ONLINE       racdb1                                       
               ONLINE  ONLINE       racdb2                                       
ora.registry.acfs
               ONLINE  ONLINE       racdb1                                       
               ONLINE  ONLINE       racdb2                                       
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       racdb2                                       
ora.LISTENER_SCAN2.lsnr
      1        ONLINE  ONLINE       racdb1                                       
ora.LISTENER_SCAN3.lsnr
      1        ONLINE  ONLINE       racdb1                                       
ora.cvu
      1        ONLINE  ONLINE       racdb1                                       
ora.racdb1.vip
      1        ONLINE  ONLINE       racdb1                                       
ora.racdb2.vip
      1        ONLINE  ONLINE       racdb2                                       
ora.proddb.db
      1        ONLINE  ONLINE       racdb1                    Open               
      2        ONLINE  ONLINE       racdb2                    Open               
ora.proddb.finprod.svc
      1        ONLINE  ONLINE       racdb1                                       
      2        ONLINE  OFFLINE                                                  
ora.oc4j
      1        ONLINE  ONLINE       racdb1                                       
ora.scan1.vip
      1        ONLINE  ONLINE       racdb2                                       
ora.scan2.vip
      1        ONLINE  ONLINE       racdb1                                       
ora.scan3.vip
      1        ONLINE  ONLINE       racdb1  


Here service is not running in node2. In very first step stop entire service in all nodes and start it again. If same issue is comming you can follow below practice.

1. Ensure that the database is properly started up, that is, mounted and opened.
2. Find PID and kill the process:

cd $GRID_HOME/log/nodename/agent/crsd/oraagent_oracle
cat oraagent_oracle.pid
You should see a pid number in the file.
ps -ef |grep <pid> -----> verify if this pid is the oraagent.bin process.
if it is the oraagent.bin process kill it.
kill -9 <pid>

3. If not ps -ef|grep d.bin. You should find two oraagent.bin processes kill these.

Killing the process should not have any effect and they should respawn again automatically.
Confirm that the processes have started.

4. Then restart the service using srvctl.

srvctl start service -d <database_name> -s <service_name> -i <instance_name>

Activity done in my test env.:

cd /u01/app/grid4/log/racdb2/agent/crsd/oraagent_oracle
$ cat oraagent_oracle.pid
7798818

$ ps -ef|grep 7798818

$ ps -ef|grep 7798818
  oracle  7798818        1   2   Jul 22      - 43:22 /u01/app/grid4/bin/oraagent.bin
  oracle 16974300 36307328   0 17:35:08  pts/4  0:00 grep 7798818

kill -9 7798818

 ps -ef|grep d.bin
  oracle  5767238        1   0   Jul 22      -  7:16 /u01/app/grid4/bin/evmd.bin
  oracle  5832844        1   0   Jul 22      -  1:45 /u01/app/grid4/bin/gpnpd.bin
    root  5963902        1   0   Jul 22      - 15:07 /u01/app/grid4/bin/ohasd.bin reboot
  oracle  8192028        1   0   Jul 22      -  0:05 /u01/app/grid4/bin/mdnsd.bin
  oracle  6291842  8585614   3   Jul 22      - 70:27 /u01/app/grid4/bin/ocssd.bin
    root  7668170        1   0   Jul 22      -  7:40 /u01/app/grid4/bin/octssd.bin reboot
    root 26411340        1   3   Jul 23      - 41:36 /u01/app/grid4/bin/osysmond.bin
  oracle 28049762 36307328   0 17:36:52  pts/4  0:00 grep d.bin
    root 45875706        1   0   Jul 22      - 18:22 /u01/app/grid4/bin/crsd.bin reboot
  oracle 51052946        1   0   Jul 22      - 10:53 /u01/app/grid4/bin/gipcd.bin

Note : There is no pid value as we killed.

Now start the service again:

$ srvctl start service -d proddb -s finprod -n racdb2 -o open

Yah. It is started. Now see the "crsctl stat res -t" output:

$ crsctl stat res -t
--------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS      
--------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------
ora.DATA.dg
               ONLINE  ONLINE       racdb1                                       
               ONLINE  ONLINE       racdb2                                       
ora.FRA.dg
               ONLINE  ONLINE       racdb1                                       
               ONLINE  ONLINE       racdb2                                       
ora.LISTENER.lsnr
               ONLINE  ONLINE       racdb1                                       
               ONLINE  ONLINE       racdb2                                       
ora.asm
               ONLINE  ONLINE       racdb1                    Started            
               ONLINE  ONLINE       racdb2                    Started            
ora.gsd
               OFFLINE OFFLINE      racdb1                                       
               OFFLINE OFFLINE      racdb2                                       
ora.net1.network
               ONLINE  ONLINE       racdb1                                       
               ONLINE  ONLINE       racdb2                                       
ora.ons
               ONLINE  ONLINE       racdb1                                       
               ONLINE  ONLINE       racdb2                                       
ora.registry.acfs
               ONLINE  ONLINE       racdb1                                       
               ONLINE  ONLINE       racdb2                                       
--------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       racdb2                                       
ora.LISTENER_SCAN2.lsnr
      1        ONLINE  ONLINE       racdb1                                       
ora.LISTENER_SCAN3.lsnr
      1        ONLINE  ONLINE       racdb1                                       
ora.cvu
      1        ONLINE  ONLINE       racdb1                                       
ora.racdb1.vip
      1        ONLINE  ONLINE       racdb1                                       
ora.racdb2.vip
      1        ONLINE  ONLINE       racdb2                                       
ora.proddb.db
      1        ONLINE  ONLINE       racdb1                    Open               
      2        ONLINE  ONLINE       racdb2                    Open               
ora.proddb.finprod.svc
      1        ONLINE  ONLINE       racdb1                                       
      2        ONLINE  ONLINE       racdb2                                       
ora.oc4j
      1        ONLINE  ONLINE       racdb1                                       
ora.scan1.vip
      1        ONLINE  ONLINE       racdb2                                       
ora.scan2.vip
      1        ONLINE  ONLINE       racdb1                                       
ora.scan3.vip
      1        ONLINE  ONLINE       racdb1


Service Name Usage more points:


If you have a RAC database, you may have introduced a few features to support some of the advanced functionalities of RAC, e.g., load balancing across nodes and making sessions fail over to a surviving node when a node dies. These things will occur only if users are employing the proper connect string, utilizing SERVICE_NAME and not SID. Here is one way to use connect strings, utilizing service name "finprod" connecting to the RAC database running on three nodes – rac1, rac2 and rac3.


PROD =
 (DESCRIPTION =
 (LOAD_BALANCE = on)
 (FAILOVER = on)
 (enable = broken)
 (ADDRESS = (PROTOCOL = TCP)(HOST = racdb1)(PORT = 1521))
 (ADDRESS = (PROTOCOL = TCP)(HOST = racdb2)(PORT = 1521))
 (ADDRESS = (PROTOCOL = TCP)(HOST = racdb3)(PORT = 1521))
 (CONNECT_DATA =
 (SERVER = DEDICATED)
 (SERVICE_NAME = finprod)
 (FAILOVER_MODE =
 (TYPE = SELECT)
 (METHOD = BASIC)
 (RETRIES = 120)
 (DELAY = 2)
 )
 )
 ) 

Key Note on database service:

Service manage in Oracle RAC database- Command List
(Keys and description)
-d Database Name
-i Instance Name
-s Service Name
-n Node Name
-r Preferred list
-a Available list
-v Verbose
-p policy {AUTOMATIC | MANUAL}
-t TAF
-f  Force stop or relocate service operations

-- Check status of database and list services:
$ srvctl status database -d db_unique_name -v
Or using View :
SQL> select * from gv$active_services

-- View configuration of services
$ srvctl config service -d DB2 –a

-- Add service
$ srvctl add service -d PROD -s OLTP_SVC -r PROD1,PROD2 -a PROD3
$ srvctl add service -d PROD -s BATCH_SVC -r PROD3 -a PROD1,PROD2

-- Modify service
$ srvctl modify service -s service_name -d db_unique_name -n -i pref_inst_list [ -a avail_inst_list] [-f]
e.g., $ srvctl modify service -s OLTP_SVC -d PROD -n -i PROD3 -a PROD1,PROD2

-- Moving a service member from one instance to another (move and change configuration):
$ srvctl modify service -s PROD_UK_SEARCH  -d PROD -i PROD1 -t PROD2

-- Relocate service (not changing configuration):
$ srvctl relocate service -d PROD -s PROD_OPERATIONS_SERVICE -i PROD1 -t PROD3

-- Enable|disable|start|stop service
$ srvctl enable|disable|start|stop service  -s service_name -d db_unique_name

-- Remove Service:
$ srvctl remove service -d PROD -s PROD_UK_SEARCH -i PROD5

-- Create Service and TAF:
$ srvctl add service –d PROD –s crmprd  -r  node1, node2  -B SERVICE TIME –j  SHORT
$ srvctl start service –d PROD –s crmprd  [- w – failover delay, -z -> failover retries
$ srvctl add service -d PROD -s hrprod -r PROD1,PROD2 -m BASIC -e SELECT -z 180 -w 5 -j LONG


Reference :
http://docs.oracle.com/cd/E11882_01/rac.112/e41960/srvctladmin.htm

4 comments:

  1. This comment has been removed by a blog administrator.

    ReplyDelete
  2. Never seen any docs like this. Appreciate your efforts.

    ReplyDelete
  3. This comment has been removed by a blog administrator.

    ReplyDelete

Translate >>