Dec 29, 2015

Improve performance for materialized view refreshes

Most of case DBA's faced issued with IOs and some times poor performance for mviews and mview refreshes. If creating index on mview resolve the issue, then we can focus how fast we can refresh the mviews.

Materialized Views are a wonderful tool for reducing repetitive I/O and they are a true silver bullet under certain circumstances.  The problem is keeping the materialized view refreshed, and refreshing materialized views has always been resource-intensive and problematic.

The problem with materialized view for pre-joined tables is keeping them current with the refresh mechanism.  Because the materialized view is built from many tables, and changes to the base tables require an update to the materialized view (via a snapshot refresh or full refresh).

Boost up materialized view refreshes:

Oracle "fast refresh" mechanism is already optimized by Oracle.  So, you can speed-up a materialized view refresh on a high DML system.  There are several options:

- Partition the base tables - Do a partition for remote/ base table. Use efficient partition. See my notes on partition in this blog.

- Use parallel DML - Use parallel materialized view refreshing on tables. It will improve performance considerably. Rather than having one load which took 2 hours, I run 4 parallel loads, one for each partition. The length of time for the whole process is now determined by how long the biggest partition takes to load. In my case, this is 40 minutes, with two 30 minute loads and one 20 minute load.  Overall I am saving 1 hour 20 minutes. I can now add further partitions and do the same thing. My only limitation is the parallel loads because I don't have unlimited processing power.

- Use super-fast solid-state disks - The easiest and most reliable way is to speed-up a materialized view refresh is to move the target tables and MV's to SSD. SSD runs several hundred times faster than platter disk, and it plops right in, just a few hours to install.

Other Tips:

Use materialized views only for enabling query rewrite (which means creating them on a prebuilt summary table, and unless you are loading to the summary by partition exchange then you have to drop the MV, refresh the summary table, and recreate the MV). In 10g it is much more easy to use the DBMS_Advanced_Rewrite package instead of MV's.

Write your own refresh code, based on the usual principles of writing good SQL. If you don't need a merge then don't use it. If you don't need to join to dimension tables to get higher attributes then don't do it.

Leverage different levels of aggregation to help produce higher levels. For a series of hierarchical summaries, multi-level aggregations can be extremely beneficial.

Consider storing the refresh and MV definition SQL in CLOB columns of a summary management table, so they can be tuned and edited without needing to open up package code to do so.

Consider using a complete refresh, either through MV's or manually, for higher aggregation levels, particularly when you can reference another summary table to do so.


Dec 27, 2015

Complete recovery of CDB, PDB and root in Oracle 12c

Here we will discuss how to restore root database, CDB ( container database) and PDB ( Plugable database) in Oracle 12c. This includes some new concepts like SYSBACKUP privilege and others. The below points are completely new in Oracle 12c. The Recovery Manager continues to enhance and extend the reliability, efficiency, and availability of Oracle Database Backup and Recovery.

Following topics are covered in this article:

1) SYSBACKUP previlege
2) Support for multitenant container and pluggable databases
3) Complete recovery of CDB, PDB and root
4) Backup of Archived redo logs
5) DUPLICATE enhancements

1) SYSBACKUP Privilege

Prior to 12c, users needed SYSDBA privilege to backup the database. The new SYSBACKUP privilege allows the user the permissions to perform only backup operations.
The SYSBACKUP privilege allows the DBA to perform RMAN backup commands without additional privileges. Using this new role in 12c, you can segregate Administration and Backup operations.
With RMAN you have same authentication options that are available with SQL*Plus, which are Operating system authentication and password file authentication.
To connect to RMAN using Operating system Authentication Authentication with the SYSBACKUP Privilege use:
$ rman target ' "/ as sysbackup" '
Authentication with the SYSDBA Privilege use:
$ rman target ' "/ as sysdba" '
You can also implicitly connect using below command
$ rman target /
To Connect to RMAN using Password file Authentication Authentication with the SYSBACKUP Privilege use:
$ rman target1 ‘ “bkpadm@DB1 as sysbackup” ‘
Where bkpadm is the user and should have SYSDBA privilege.
Authentication with the SYSDBA Privilege
$ rman target ‘ “sysadm@DB1 as sysdba” ‘
You can implicitly connect using below command. Where sysadm is the user and should have SYSDBA privilege.
$ rman target sysadm@DB1
Note that SYSBACKUP does not include data access privilege, such as SELECT ANY TABLE. When you don’t specify the role explicitly then the default used is AS SYSDBA.

2) Support for multitenant container and pluggable databases

The multitenant container database (CDB) and pluggable databases (PDB) are introduced in Oracle 12c, and RMAN provides full support for backup and recovery. Using RMAN you can back up an entire container database or individual pluggable databases and also can perform point-in-time recovery. But it is recommended that you turn on control file auto backup. Otherwise point-in-time recovery for pluggable databases may not work efficiently when you need to undo data file additions or deletions.
The multitenant architecture manages many databases as one and retains the isolation, resource control of each database. This will help to manage both infrastructure and human resources effectively.
Backing up a container database is similar to backing up a non-container database. When you back up a container database, RMAN backs up the root, pluggable databases in the container, and archive logs.  When you need to restore you can choose the whole container, one or more pluggable databases or the root only.

3) Backup the CDB, PDB, and root

You should have SYSBACKUP or SYSDBA privilege to backup any of the databases.

You can backup the Container Database (CDB) as same as non-container database using below command:

RMAN> BACKUP DATABASE plus ARCHIVELOG;
You can backup the Pluggable Database (PDB) using below command:
RMAN> BACKUP PLUGGABLE DATABASE PDB1, PDB2;
Or connect to pluggable Database in RMAN :
$ rman target sys@PDB1
RMAN> BACKUP DATABASE;
You can backup the root using below command:
RMAN> BACKUP DATABASE ROOT;

3) Complete recovery of CDB, PDB and root

You should have SYSBACKUP or SYSDBA privilege to restore any of the databases.
Restoring Container Database (CDB) is similar to non-container database.
You can restore the whole CDB using below script:
RMAN> RUN { 
     STARTUP MOUNT;
     RESTORE DATABASE;
     RECOVER DATABASE;
     ALTER DATABASE OPEN;
}
Note that restoring CDB database will restore all the pluggable databases.
You can restore only ROOT Database using below script:
RMAN> RUN {
     STARTUP MOUNT;
     RESTORE DATABASE ROOT;
     RECOVER DATABASE ROOT;
     ALTER DATABASE OPEN;
}
You can restore Pluggable Databases in two ways. Either you can restore from root container and connect directly to PDB to restore.
Use below script to restore from root. Using this approach you can able to restore and recover multiple PDB’s with a single command.
RMAN > RUN {
     RESTORE PLUGGABLE DATABASE PDB1, PDB2;
     RECOVER PLUGGABLE DATABASE PDB1, PDB2;
     ALTER PLUGGABLE DATABASE PDB1, PDB2 OPEN;
}
Use below script to connect PDB, restore and recover the database. Using this approach you will be able to restore and recover only one PDB.
$ rman target=bkpadm@PDB1
     RMAN> run{
     RESTORE DATABASE;
     RECOVER DATABASE;
}
The steps for performing a point-in-time recovery of the CDB or PDB are the same as a normal database. But note that when you perform Point-in-time recovery on the CDB, it will effect on all the PDBs as well.
When you perform point-in-time recovery on a PDB, it will affect that single database.
The command to perform a point-in-time recovery is:

SET UNTIL TIME "TO_DATE(’01-Jan-2014 01:00:00’,’DD-MON-YYYY HH24:MI:SS’)";
SET UNTIL SCN 1999945; # alternatively, specify SCN
SET UNTIL SEQUENCE 100; # alternatively, specify log seq
Below are the few examples to ALTER PLUGGABLE DATABASE.
  • Use this command to open all PDBs in one command:
    ALTER PLUGGABLE DATABASE ALL OPEN;
  • Use this command to open all PDBs except PDB3:
    ALTER PLUGGABLE DATABASE ALL EXCEPT PDB3 OPEN;
  • Use this command to open PDB4,PDB5 in read only mode:
    ALTER PLUGGABLE DATABASE PDB4, PDB5 OPEN READ ONLY;
  • Use below command to shut down all PDBs in single command:
    ALTER PLUGGABLE DATABASE ALL CLOSE IMMEDIATE;

4) Backup of Archived redo logs

You can back up archive logs when they connect to root as a common user with SYSDBA or SYSBACKUP privilege, but you cannot back up or delete archive logs when you connect to PDB as a local user with SYSDBA or SYSBACKUP privilege.
You are only able to switch the archived logs when you connect to the root of a CDB, but you cannot switch archived redo logs when connected to a PDB.
If you have more than one archive log destination, when you use RMAN to backup the archive redo logs it backs up only one copy of the archived redo logs. RMAN does not include more than one copy because multiple destinations will have same log sequence number.
You can use any of the below commands to backup the archived redo logs
The command below backs up the database and all the archived redo logs:
RMAN > BACKUP DATABASE PLUS ARCHIVELOG;
The command below only backs up one copy of the sequence number for all archived redo logs.
RMAN> BACKUP ARCHIVELOG ALL;

5) DUPLICATE enhancements:

When you duplicate a database using RMAN DUPLICATE, the database is created and opened with RESETLOGS mode. With Oracle database 12c, you can specify that the database must not be opened with “NOOPEN” clause.
This NOOPEN clause useful under following situations:
  • If you need to make changes to initialization parameters such as block change tracking, flashback database settings
  • Opening the database conflict with other source database
  • If you plan to create database for upgrade and want to open in upgrade mode
The command below creates duplicate database, but it will not open.

RMAN> DUPLICATE TARGET DATABASE TO DB1 FROM ACTIVE DATABASE NOOPEN;

Click here to read more on 12c RMAN enhancements

Hope this may help you as a fundamental document.

Dec 21, 2015

when an index should be rebuilt?

 Concept:

An Oracle server index is a schema object that can speed up the retrieval of rows by using a pointer.

You can create indexes on one or more columns of a table to speed SQL statement execution on that table. If you do not have an index on the column, then a full table scan occurs.

You can reduce disk I/O by using a rapid path access method to locate data quickly. By default, Oracle creates B-tree indexes.

After a table experiences a large number of inserts, updates, and deletes, the index can become unbalanced and fragmented and can hinder query performance.

How to determine an index needs to be rebuilt?

We must first get an idea of the current state of the index by using the ANALYZE INDEX VALIDATE STRUCTURE command.

The VALIDATE STRUCTURE command can be safely executed without affecting the optimizer. 

The VALIDATE STRUCTURE command populates the SYS.INDEX_STATS table only. The SYS.INDEX_STATS table can be accessed with the public synonym INDEX_STATS. The INDEX_STATS table will only hold validation information for one index at a time. You will need to query this table before validating the structure of the next index.

Below is a sample output from INDEX_STATS Table.

SQL> ANALYZE INDEX IDX_EMP_ACCT VALIDATE STRUCTURE;

Statement processed.

SQL> SELECT name, height,lf_rows,lf_blks,del_lf_rows FROM INDEX_STATS;

NAME                      HEIGHT    LF_ROWS    LF_BLKS    DEL_LF_ROW
---------------------- -----------   ----------      ----------   ----------------
DX_EMP_ACCT           2             1                     3               6

1 row selected.

There are two rules of thumb to help determine if the index needs to be rebuilt.
1)     If the index has height greater than four, rebuild the index.
2)     The deleted leaf rows should be less than 20%.

If it is determined that the index needs to be rebuilt, this can easily be accomplished by the ALTER INDEX <INDEX_NAME> REBUILD | REBULID ONLINE command. It is not recommended, this command could be executed during normal operating hours. The alternative is to drop and re-create the index. Creating an index uses the base table as its data source that needs to put a lock on the table. The index is also unavailable during creation.

 In this example, the HEIGH column is clearly showing the value 2. This is not a good candidate for rebuilding. For most indexes, the height of the index will be quite low, i.e. one or two. I have seen an index on a 2 million-row table that had height two or three. An index with height greater than four may need to be rebuilt as this might indicate a skewed tree structure. This can lead to unnecessary database block reads of the index. Let’s take another example.

SQL> ANALYZE INDEX IDX_EMP_FID VALIDATE STRUCTURE;

Statement processed.

SQL> SELECT name, height, lf_rows, del_lf_rows, (del_lf_rows/lf_rows)
*100 as ratio FROM INDEX_STATS;

NAME                           HEIGHT     LF_ROWS    DEL_LF_ROW RATIO    
------------------------------ ---------- ---------- ---------- -------
IDX_EMP_FID                                  1          189         62        32.80

1 row selected.

In this example, the ratio of deleted leaf rows to total leaf rows
is clearly above 20%. This is a good candidate for rebuilding.
Let’s rebuild the index and examine the results

SQL> ANALYZE INDEX IDX_EMP_FID REBUILD;

Statement processed.

SQL> ANALYZE INDEX IDX_EMP_FID VALIDATE STRUCTURE;

Statement processed.

SQL> SELECT name, height, lf_rows, del_lf_rows, (del_lf_rows/lf_rows)*
100 as ratio FROM INDEX_STATS;

NAME                           HEIGHT     LF_ROWS    DEL_LF_ROW RATIO    
------------------------------ ---------- ---------- ---------- -------
IDX_EMP_FID                                  1          127         0        0

1 row selected.

Examining the INDEX_STATS table shows that the 62 deleted leaf rows were dropped from the index. Notice that the total number of leaf rows went from 189 to 127, which is a difference of 62 leaf rows (189-127). This index should provide better performance for the application.


Script to rebuild indexes:

It is very difficult to write a script that will identify indexes that will benefit from rebuilding because it depends on how the indexes are used.  For example, indexes that are always accessed vis an index unique scan" will never need rebuilding, because the "dead space" does not interfere with the index access. 
Only indexes that have a high number of deleted leaf blocks and are accessed in these ways will benefit from rebuilding:
  • index fast full scan
  • index full scan
  • index range scan
Getting statistically valid: proof from a volatile production system would be a phenomenal challenge.  In a large production system, it would be a massive effort to trace LIO from specific queries to specific indexes before and after the rebuild.

Still you can use below script to rebuild index after all verification:

Select 'alter index ' || owner || '.' || index_name || ' rebuild online;'
  from all_indexes
 where owner='XXX'
 and index_type not in ('DOMAIN', 'BITMAP','FUNCTION-BASED NORMAL','IOT - TOP')
 order by owner, index_name;

Note: Only rebuilt B-tree indexes as a global concept.

Is deleted leaf blocks are reused?

Yes. but depends upon how soon data will be reinserted and while B-Tree will balance the tree will reuse it.

Sample Test:
SQL> create table test_empty_block (id number, value varchar2(10));
Table created.
SQL> begin
2 for i in 1..10000 loop
3 insert into test_empty_block values (i, 'Bowie');
4 end loop;
5 commit;
6 end;
7 /
PL/SQL procedure successfully completed.
SQL> create index test_empty_block_idx on test_empty_block (id);
Index created.



SQL> delete test_empty_block where id between 1 and 9990;
9990 rows deleted.
SQL> commit;
Commit complete.
SQL> analyze index test_empty_block_idx validate structure;
Index analyzed.
SQL> select lf_blks, del_lf_rows from index_stats;
LF_BLKS DEL_LF_ROWS
---------- -----------
21             9990



Now reinsert a similar volume but after the last current values
SQL> begin
2 for i in 20000..30000 loop
3 insert into test_empty_block values (i, 'Bowie');
4 end loop;
5 commit;
6 end;
7 /
PL/SQL procedure successfully completed.
SQL> analyze index test_empty_block_idx validate structure;
Index analyzed.
SQL> select lf_blks, del_lf_rows from index_stats;
LF_BLKS DEL_LF_ROWS
---------- -----------
21             0
Note all empty blocks have been reused and deleted rows cleanout.
Following select statement was executed after the 9990 deletions in previous example

SQL> select /*+ index test_empty_blocks */ * from test_empty_blocks
where id between 1 and 100000;
10 rows selected.
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_EMPTY_BLOCKS'
2 1 INDEX (RANGE SCAN) OF 'TEST_EMPTY_BLOCKS_IDX' (NON-UNIQUE)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
28 consistent gets
0 physical reads
0 redo size
549 bytes sent via SQL*Net to client
499 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
10 rows processed

See more from Oracle Doc ID 1373415.1

Dec 2, 2015

Oracle PL/SQL programming common mistakes

Points covered:

1) Formatting data in views
2) Hardcoding local Varchar2 variable size
3) Ignoring exceptions
4) Not using bound variables for changing parameters
5) Storing empty LOBs
6) Too many levels of views
7) Transactional control in non-autonomous procedures
8) Using Sequence nextval without curval
9) Using bound variables for constants
10) Using derived column values for existence checks
11) Using non-dedicated packages for continuous jobs
12) Wrapping everything into stored procedures
13) Use of Truncate for Global Temporary Table(GTT)
14) Using non-deterministic functions directly in conditions
15) Catch-all error handling

I obsrved lots of time in pl/sql code, there are some common mistakes while writing codes for various environments. If codes are written for OLTP applications, these kind of mist be avoided. Here are my descriptions for all the above points with root cause analysis and solution.

1) Formatting data in views

Severity: Makes system harder to use

Symptoms:
• to_char used on date or numeric values in views
• strings concatenated in view code to form pretty printed values

Why is this bad?
• data format cannot be easily modified on the front-end, since some information may be lost
• values cannot be easily modified (i.e. applying time zone shifting becomes much harder)
• filtering based on underlying date or string values becomes much more processor-heavy and requires full table scans and/or substring matching/comparisons.
• internationalisation becomes much harder — instead of translating elements and then combining them, translation engines must analyse and translate/reformat formatted data

Solutions:
• Format data on the front-end, not in the database.
• Perform formatting in queries coming from the front end, specifying exactly what the front end needs
— but database views should not suppose any specific data format.


2) Hardcoding local Varchar2 variable size

Severity: Makes system harder to use

Symptoms
• PL/SQL function or procedure declares local Varchar2 variables for temporary storage of table values, with hard-coded length
• Views declared with Varchar2 types with hard-coded length

Why is this bad?
• Code is error prone, because hard-coded values may not allow for enough space to store the entire value coming from a database table.
• Even if the size is correct, if the underlying type ever changes, errors such as ORA-06502 ‘Character string buffer too small’ may start appearing in procedures.

Solution:

use %TYPE to reference the underlying column type instead of hard-coding the type and size for local variables.

Exceptions
• variables and fields not related to underlying table data
• fields or variables that combine several table fields

3) Ignoring exceptions

Severity: Risk of data corruption

Symptom

This is a typical example:

begin
...
Exception When others then
 NULL;
end;

This kind of code is written when errors such as attempts to insert a duplicate record or modify a nonexisting row should not affect the transaction. It is also common in triggers that must be allowed to fail without effecting the operation which caused them (best-effort synchronisation with an external resource) Less frequently, this code is written by junior developers who do not knowing what to do in case of an error, so they just disregard exceptions.

Why is this bad?

Serious errors such as storage problems or table mutations might be hidden from the calling code

Solutions
• If you do not want any errors to affect current transaction, execute the code in an autonomous transaction and log errors to an error table/log table. For critical functions implement some sort of administrative notifications for those errors. For low priority functions, check the log table to periodically for errors.
• If you want to ignore certain exceptions, because they can be solved by re-processing, handle only those specific exceptions.

Exception

low-risk functions where any errors can safely be ignored

4) Not using bound variables for changing parameters

Severity: Reduced performance

Symptom
Frequently executing the same query with different parameter values, but specifying parameter values literally, without using bound variables.

Why is this bad?
• Database engine will have to compile the query every time and will not be able to cache the statement.
• If care is not taken to prevent SQL injection, may open a security hole in the system.

Solution
For all parameters that are genuinely changing, use a bound variable instead of specifying the value literally.

Exceptions
• Ad-hoc queries that are run only once or infrequently
• Parameters where statement caching is pointless for different values

5) Storing empty LOBs

Severity: Reduced performance

Symptom
Empty CLOB values used instead of NULL for CLOB fields that do not hold a value

Why is this bad?
Oracle allocates space for EMPTY CLOBs. In tables with large number of empty CLOB columns, this can take up significant storage space.

Solution
Use NULL instead of EMPTY CLOB

6) Too many levels of views

Severity: Reduced performance

Symptom
A large hierarchy of views containing sub-views or subqueries is in place. Such hierarchy is usually established as several layers of abstraction over abstraction, typically when adding new core features to underlying models, but keeping client API for backward compatibility using a set of higher-level views.

Why is this bad?
Optimiser will give up and run full table scans even if indexes could be used after typically 8 or 9 levels of nesting.

Solutions
• Flatten the structure so that it has less than 8 levels. Use joins instead of subqueries where possible
• Use materialised views to cut off a part of the hierarchy
• If materialised views cannot be used for performance reasons, use an aggregated table maintained by triggers to do the same.

7) Transactional control in non-autonomous procedures

Severity: Risk of data corruption

Symptom
Commit or rollback statement in a stored procedure or function without PRAGMA AUTONOMOUS TRANSACTION

Why is this bad?
Effectively prevents stored procedures from being used in a wider context — rolling back or committing inside a stored procedure will delete/permanently write data that was used in a wider transaction, in the middle of that transaction.

May cause issues that are very hard to trace/debug - you will not be able to check if data was processed correctly when the procedure rolled back. audit logs will contain references to records which, from the logical point of view, never existed.

May cause inconsistent data — since rolling back/committing will split a wider logical transaction into two — one which rolled back and another one which is running, relational constraints might fail in the secondtransaction. even worse, if the relational constraint checks were not enforced, inconsistent data might be written permanently.

Solutions
• Throw exceptions in case of errors; let the caller decide what to do in case of error. Do noting in case the operation succeeded — let the caller decide if the entire wider transaction is correct or not.
• Add PRAGMA AUTONOMOUS_TRANSACTION; to the procedure header to make it run as an autonomous transaction

Exceptions
• Long-running worker procedures such as batch updates (may include suboperations and save-points to store partial results. should be marked as autonomous transaction).
• Auditing (should be done with autonomous transactions)

8) Using Sequence nextval without curval

Severity: Risk of data corruption

Symptoms
• Sequence currval method used in a procedure or trigger without calling nextval first — typically in a trigger that updates a log record, or a procedure that partially processes data
• sequence currval used to calculate the next value which will be read by nextval

Why is this bad?
• Calling the method will cause an exception if the sequence is not initialised — so the method/trigger
depends on the caller to initialise the sequence first
• Procedures relying on someone else to initialise the sequence must be called in a specific context, which limits their reuse
• Triggers may or may not work depending on the order of execution
• If procedure/trigger uses the current sequence value to update the relevant record, calling it in a different context/order of execution may update the wrong record
• Sequences are not transactional — they can be cached or changed in another session between calls to curval and nextval. currval+1 is not guaranteed to be the next value; using that to predict IDs is very dangerous, as it can lead to wrong records being deleted or updated.

Solutions
• Do not use currval to read contextual information. Pass IDs explicitly to procedures or use sys_context to store contextual information
• Use nextval instead of currval where appropriate (if you just need a unique number)

9) Using bound variables for constants

Severity: Reduced performance

Symptoms
• Bound variables used in queries for values that are never changing (often when client developers bind all variables in a query).
• Bound variables used for parameters where actual value can significantly effect the optimisation plan Why is this bad?
• Optimiser will not use the most efficient plan to execute the query
• If variable peeking is turned on, it might actually chose a completely wrong execution plan for subsequent runs

Solution
For values that are not changing in a query, or where statement caching does not make sense, use a literal and not a bound variable (this does not imply that other genuine parameters in the same query should not be bound).

10) Using derived column values for existence checks

Severity: Reduced performance

Symptom
count, sum or some other aggregate function (i.e. custom made csv) executed in a subquery, and then
compared to 0 or 1 or NULL in the WHERE clause of the outer query in order to check whether at least one (or exactly one) related element in the subsctructure exists; results of the aggregate function are then discarded (not passed on through the outer view).

The subquery was typically developed first, possibly as a full-fledged view, and then the outer query/view, which filters results of the first view; was written. Less frequently they are in the same view, first one used as a subquery.

CREATE OR REPLACE instrument_v AS
SELECT instr.idinstr, instr.name, count(opt.type) AS activeoptiontypes
FROM instrument instr, instrumentoption opt WHERE instr.ididinstr=opt.idinstr (+)
GROUP BY instr.idinstr, instr.name;
...
SELECT idinstr, name FROM instrument_v WHERE activeoptiontypes >0

Why is this bad?

Instead of just checking indexes, potentially a large number of records will have to be read and analysed.

Solution
Instead of comparing the results of aggregate function, create a different view that checks directly for existence of the related type — optimiser will be able to execute the plan using indexes and will not have to calculate aggregate functions for irrelevant rows


11) Using non-dedicated packages for continuous jobs

Severity: Makes system harder to use

Symptoms
• A continuous or frequent database job executes a long-running procedure from a pl/sql package. That same package is also used for client access or by other schemas.
• The body for a continuous or frequent database job is a procedure from such a package.

Why is this bad?
The package will be locked while the procedure is running. With a continuous or frequently running job, this may require interrupting the job or causing down-time even for the smallest changes like granting execution to other users/schemas to the package.

Solution
Extract procedures for job bodies into dedicated pl/sql packages or into standalone procedures.

12) Wrapping everything into stored procedures

Severity: Makes system harder to use

Symptoms

• Client access layer consists exclusively or mostly of stored procedures - both for writing and for reading. Ref Cursor output parameters of stored procedures used to read data.
This is typical for SQLServer programmers moving to Oracle, due to earlier SQL Server limitations in terms of access privileges and functionality. It is also often done under a false pretext that using stored procedures is more secure or brings better performance. In fact, the same security restrictions can be applied to views as to stored procedures in Oracle. If bound variables and statement caching are used, query access to views also brings pre-compilation benefits, so there is no performance gain.

• Stored procedures are used to fill in additional details when inserting or updating data (automatically calculated columns, timestamps and similar).

Why is this bad?

• Procedural access limits operations to a choice of parameters - i.e. deleting records with specific name. You end up by providing a stored procedure for every possible combination of parameters, or by providing “generic” stored procedures that are no better than allowing direct table access.

• Procedural access is typically done on row-level (procedures work with a single row), which has significant performance penalties over tabular access for operations that work on multiple rows.
• Output Ref Cursor parameters do not allow clients to apply further filters or conditions
• Output Ref Cursor cannot be easily joined with additional data
• Output Ref Cursors data cannot be easily paged on demand
• Packaging data-manipulation steps in a stored procedure does not prevent someone from using the
table differently and inserting or modifying data directly. Your rules will only be applied if the clients
are forced to use your stored procedures. As the code base grows, and different programmers join and
leave the team, this will be harder and harder to enforce.

Solutions

• Use views for reading data, do not use stored procedures; instead of passing filter parameters to stored procedures, expose those values as view columns and allow clients to filter using Where. Apply appropriate security restrictions to views.

• Use views for inserting and updating rows if the operation requires dynamic parameters.
• If all required columns in views are not updatable, consider using instead-of triggers to provide the
functionality
• Use triggers to perform action such as populating missing columns on data modification on tables. This ensures that the data will be populated regardless of how the table is used.
• Use stored procedures to encapsulate business rules and procedural processing logic; If front-end requires data that’s calculated from database column values, rather than stored in them, encapsulate that logic in functions and include functions in views. Procedures/functions should be used to extract common procedural steps from triggers and to simplify triggers and jobs.

Exceptions

• Output REF CURSOR objects should be used when filtering view columns for read-only access severely effects performance — for example when subqueries have aggregate functions and filtering outer view would cause query data to be executed on all data, and then filtering it. In this case, inserting specific values into the subquery could significantly reduce execution time.
• This technique can also be used to provide an uniform API for related reports, where query depends on parameter values or supplied parameters

13) Use of Truncate for Global Temporary Table(GTT).

Severity: Reduced performance

Symptom

Database some times dam slow or hung.

Why is this bad?

As a bad practice, developers used to write "delete" statement without any where cluase. It will reduce performance and so many other side effects. To over come this situation developer started writting "TRUNCATE". This is bettter use but more use of truncate in application will create bad impact on performance also. Based on scope it is better to define the table as GTT and use it. But still developers are used to write TRUNCATE for GTTs. GTTs are auto truncate when session will expire.

Solution

When you writting TRUNCATE for general table use below syntax:

TRUNCATE TABLE [schema_name.]table_name
  [ PRESERVE MATERIALIZED VIEW LOG | PURGE MATERIALIZED VIEW LOG ]
  [ DROP STORAGE | REUSE STORAGE ] ;

e.g.,  TRUNCATE scott.emp reuse storage;


When you are using GTT, then dont use TRUNCATE.


Note :
DROP STORAGE Specify DROP STORAGE to deallocate all space from the deleted rows from the table or cluster except the space allocated by the MINEXTENTS parameter of the table or cluster. This space can subsequently be used by other objects in the tablespace. Oracle Database also sets the NEXT storage parameter to the size of the last extent removed from the segment in the truncation process. This is the default.
REUSE STORAGE Specify REUSE STORAGE to retain the space from the deleted rows allocated to the table or cluster. Storage values are not reset to the values when the table or cluster was created. This space can subsequently be used only by new data in the table or cluster resulting from insert or update operations. This clause leaves storage parameters at their current settings.
If you have specified more than one free list for the object you are truncating, then the REUSE STORAGE clause also removes any mapping of free lists to instances and resets the high-water mark to the beginning of the first extent.
14) Using non-deterministic functions directly in conditions

Severity: Reduced performance

Symptom
Non-deterministic function that does not depend on current row values (or depends on a subset of row values) is used in Where clause. Examples might be functions that fetch data from a referential table depending on current database context, or perform some calculations on derived data; Functions are not enclosed in select from dual or a subquery.

Why is this bad?
Functions may be executed much more times than required. It might be sufficient to execute the function just once per query, but Oracle might execute the function for each row of the result (or even worse, for each row of the source).

Solutions
• Turn function into (select function() from dual) if it does not depend on any row values
• Move function into a similar subquery if it depends on referential data, and join the subquery in the
main query
• Mark function as deterministic if it is by nature deterministic (i.e. always returns same result for same parameter values)
• Use Oracle 11 result caching if possible

15) Catch-all error handling

Severity: Risk of data corruption

Symptom
A catch-all exception block (WHEN OTHERS THEN) used to process an error (or a group of errors) in a PL/SQL procedure. This is done typically either to ignore an expected error which should not affect a transaction (for example, when duplicate key in index should be disregarded, or if a problem with inserting should be quietly logged). Catch-all error handling might also be written when a single block of code is expected to throw several types of exceptions (no data found, duplicate value on index, validation…) and the developer wants to handle them all at once.

Another example is using WHEN OTHERS block to catch domain exceptions thrown by RAISE ERROR because they cannot be individually checked.

Why is this bad?
• Catch-all block will prevent other unexpected errors from propagating. For example, a mutating table error caused by a trigger may be hidden with a catch-all block, completing the transaction successfully when it did not really do everything it needed.
• There is often an assumption about the error that can occur, which may be incorrect when a different type of exception is thrown. This may lead to inconsistent data. For example, if a catch-all block is used to insert a record in case of a missing data error, then that record may be inserted on any other error as well.

Solutions
• Do not use WHEN OTHERS to check for a particular exception such as NO_DATA_FOUND. Check directly for a particular type of exception.
• To handle domain-specific errors, declare your domain exceptions so that you can check for a particular exception later in the code.
• Do not assume that you know all errors that may occur at a given place in the code. Storage problems, mutating tables and similar issues should surface to the client, and not be handled and discarded in an unrelated piece of the code.

Exception
Shielding parts of the system from errors in other parts

will continue...in next publish

Nov 29, 2015

Direct and Asynchronous I/O setup in Orace database

Direct and Asynchronous I/O:

Most of the performance tuning issues can be related to I/O in any database. Oracle provides only two main parameters to control I/O behaviour these are  filesystemio_options and disk_asynch_io

There is a major disscussion of Oracle performance on various platforms based on typical storage. I/O operations in UNIX and Linux systems typically go through the file system cache. Although this doesn't represent a problem in itself, this extra processing does require resources. Bypassing the file system cache reduces CPU requirements, and frees up the file system cache for other non-database file operations. Operations against raw devices automatically bypass the file system cache.


When a synchronous I/O request is submitted to the operating system, the writing process blocks until the write is complete before continuing processing. With asynchronous I/O, processing continues while the I/O request is submitted and processed. This allows asynchronous I/O to bypass
some of the performance bottlenecks associated with I/O operations.

Oracle can take advantage of direct I/O and asynchronous I/O on supported platforms using the FILESYSTEMIO_OPTIONS parameter, whose possible

Posible values are listed below.

ASYNCH - Enabled asynchronous I/O where possible.
DIRECTIO- Enabled direct I/O where possible.
SETALL- Enabled both direct I/O and asynchronous I/O where possible.
NONE - Disabled both direct I/O and asynchronous I/O.

The following example shows how the parameter is set.

SQL> SHOW PARAMETER FILESYSTEMIO_OPTIONS

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
filesystemio_options                 string      none
SQL> ALTER SYSTEM SET FILESYSTEMIO_OPTIONS=SETALL SCOPE=SPFILE;

System altered.

SQL> SHUTDOWN IMMEDIATE

SQL> STARTUP

SQL> SHOW PARAMETER FILESYSTEMIO_OPTIONS

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
filesystemio_options                 string      SETALL
SQL>


FAQ:

1) FILESYSTEMIO_OPTIONS=setall always safe?

async I/O library for database data and log writes. Oracle performs writes in batches or lists, which are submitted to the operating system

using the listio() functionality. The listio() asynchronous I/O facility was added specifically to optimize database performance.

you should enable the parameter FILESYSTEMIO_OPTIONS=setall for system performance.

2) Why "setall" ?

For optimal disk performance, Oracle should always use direct I/O to its data files, bypassing any caching at the OS layer.  Direct I/O must be enabled both in Oracle and in the operating system.

Oracle controls direct I/O with a parameter named filesystemio_options. According to the Oracle documentation the filesystemio_options parameter must be set to "setall" (the preferred method, according to the Oracle documentation) or  "directio" in order for Oracle to read data blocks
directly from disk:

Using direct I/O allows you to enhance I/O by bypassing the redundant OS block buffers, reading the data block directly into the Oracle SGA.  Using direct I/O also allow you to create multiple blocksized tablespaces to improve I/O performance:

In overall, Oracle recommends to set parameter filesystemio_options  to value 'setall' but it is not always good practise especially when SGA is small. setting it to setall lets your Oracle DB perform I/O operations without going to file system cache and it saves overhead of double caching but if SGA is smaller and DB host machine  has large free memory then it is not good to set this parameter to value setall. In this case you should increase DB_CACHE_SIZE and only then set filesystemio_options to setall.


3) What will be best practice for IBM - AIX?

IBM has published an excellent case study with Oracle Database on AIX, illustrating how to optimizer AIX direct I/O for Oracle 10g and 11g databases:

Click here to see the Document

4) What other parameters look into?

Other parameters to affect write (as well as read) is dbwriter_processes. When asynchronous I/O operations are slower in operating system in comparison to synchronous I/O then turn off asynchronous I/O by setting disk_asynch_io to false and set multiple db writer processes by increasing dbwriter_processes values from 1 to 2,3 or 4  suitable value to your system. Alternate is  incrase  dbwr_io_slaves from 0 to 2,3,4 suitable value.

You would be keen to disable asynchronous I/O  when you see high average_wait on event db_file_parallel_wait. Other reason for turning it off will be synchronous I/O is more reliable.

SQL> select event,average_wait from v$system_event where event like 'db file parallel write';
EVENT                                                            AVERAGE_WAIT
---------------------------------------------------------------- ------------
db file parallel write                                                    28.2  [ centi seconds]

This is not a very good ASYNCH I/O. Try Synchronous I/O

Note 1: Asynchronous I/O operations are more prone to block corruptions than synchronous operations so many DBAs disable it and follow practice as mentioned in above paragraph. So if you do not have standby databases and oracle 11g then which autoamatically recovers corrupted block on primary then you would not want asynchronous I/O

Note 2: For 11g R2 for tuning purpose, the “db file async I/O submit” should be treated as   “db file parallel write” in previous releases.

5) What will be for Linux?

As per Burleson site, the following setup may use full.

Linux - Linux systems support direct I/O on a per-filehandle basis (which is much more flexible).  Also with 10g and beyond, this feature is already working which means that is does not require any patch.  For Oracle, the DBA will need to download - Abstract: DIRECT IO SUPPORT OVER NFS.  To enable direct I/O support:, check these settings:

   - Set the filesystemio_options parameter in the parameter file to DIRECTIO (filesystemio_options = DIRECTIO)

   - If the asynchronous I/O option is in use, the filesystemio_options parameter in the parameter file should be set to SETALL.


Oct 29, 2015

How to Check Clusterware Version and Name?

How to Check Clusterware Version and Name ?

To check the software versions on a single node. Software version is the lastest version installed in on an cluster. You would use below commands,When you do the rolling upgrade.

$hostname
EHDB01
$
$ crsctl query crs softwareversion EHDB01
Oracle Clusterware version on node [EHDB01] is [11.2.0.4.0]

Active version is lowest version anywhere in the cluster. This is the command you would normally need to use

$crsctl query crs activeversion
Oracle Clusterware active version on the cluster is [11.2.0.4.0]

The version of oracle Clusterware must be always greater than the oracle products installed in the clusters
Permanently operating oracle Clusterware (software version vs active version) is not supported.

To check the cluster name use,

$ cd $CRS_HOME/bin
$ pwd
 /u01/app/11.2.0/grid/bin
$
$ cemutlo -n
ehdbprdscan

You can seen from /etc/hosts also.

Cluster related process in 11g RAC R2 Environment

CLUSTERWARE PROCESSES in 11g RAC R2 Environment 

In any RAC environment Cluster daemons are the main agent to communicate between instances. At one glance below commands can be used what cluster daemons are running:

$ ps -ef|grep d.bin
    root  4456622        1   0   Oct 23      - 15:24 /u01/app/11.2.0/grid/bin/ohasd.bin reboot
    root  4849726        1   3   Oct 23      - 158:25 /u01/app/11.2.0/grid/bin/orarootagent.bin
    root  4915400        1   0   Oct 23      -  3:51 /u01/app/11.2.0/grid/bin/cssdagent
    root  5898304        1   0   Oct 23      - 18:57 /u01/app/11.2.0/grid/bin/orarootagent.bin
    grid  9634040        1   0   Oct 23      -  0:10 /u01/app/11.2.0/grid/bin/tnslsnr LISTENER_SCAN2 -inherit
    grid 11337848        1   0   Oct 23      -  0:07 /u01/app/11.2.0/grid/bin/mdnsd.bin
    grid 11862222        1   0   Oct 23      -  0:10 /u01/app/11.2.0/grid/bin/tnslsnr LISTENER_SCAN3 -inherit
    grid  2294256        1   0   Oct 23      -  6:45 /u01/app/11.2.0/grid/bin/evmd.bin
    grid  2621900        1   0   Oct 23      -  1:21 /u01/app/11.2.0/grid/bin/scriptagent.bin
    grid  2818414  2949552   0   Oct 23      - 16:59 /u01/app/11.2.0/grid/bin/ocssd.bin
    root  2949552        1   0   Oct 23      -  0:00 /bin/sh /u01/app/11.2.0/grid/bin/ocssd
    root  3867104        1   0   Oct 23      - 19:40 /u01/app/11.2.0/grid/bin/osysmond.bin
    root  3997956        1   0   Oct 23      -  3:35 /u01/app/11.2.0/grid/bin/cssdmonitor
    grid  4456732  2294256   0   Oct 23      -  0:07 /u01/app/11.2.0/grid/bin/evmlogger.bin -o /u01/app/11.2.0/grid/evm/log/evmlogger.info -l /u01/app/11.2.0/grid/evm/log/evmlogger.log
    grid  4719074        1   0   Oct 23      - 17:19 /u01/app/11.2.0/grid/bin/oraagent.bin
    grid  4784526        1   0   Oct 23      - 12:56 /u01/app/11.2.0/grid/bin/gipcd.bin
    root  5046556        1   1   Oct 23      - 53:09 /u01/app/11.2.0/grid/bin/ologgerd -m ehdb02 -r -d /u01/app/11.2.0/grid/crf/db/ehdb01
    root  5112254        1   0   Oct 23      - 19:11 /u01/app/11.2.0/grid/bin/crsd.bin reboot
    root  5439972        1   0   Oct 23      - 11:01 /u01/app/11.2.0/grid/bin/octssd.bin reboot
    grid  5505490        1   0   Oct 23      -  2:06 /u01/app/11.2.0/grid/bin/gpnpd.bin
    grid  7930344        1   0   Oct 23      -  0:08 /u01/app/11.2.0/grid/bin/tnslsnr LISTENER -inherit
  oracle  9371946        1   0   Oct 23      - 25:28 /u01/app/11.2.0/grid/bin/oraagent.bin
  oracle 11141592 11599954   0 17:00:11  pts/0  0:00 grep d.bin
    grid 11796808        1   0   Oct 23      - 16:56 /u01/app/11.2.0/grid/bin/oraagent.bin

i).Cluster Ready Services (CRS)

$ ps -ef | grep crs | grep -v grep
    root  5112254        1   0   Oct 23      - 19:09 /u01/app/11.2.0/grid/bin/crsd.bin reboot

crsd.bin => The above process is responsible for start, stop, monitor and failover of resource. It maintains OCR and also restarts the resources when the failure occurs.

This is applicable for RAC systems. For Oracle Restart and ASM ohasd is used.

ii).Cluster Synchronization Service (CSS)

$ ps -ef | grep -v grep | grep css
    root  4915400        1   0   Oct 23      -  3:50 /u01/app/11.2.0/grid/bin/cssdagent
    grid  2818414  2949552   0   Oct 23      - 16:57 /u01/app/11.2.0/grid/bin/ocssd.bin
    root  2949552        1   0   Oct 23      -  0:00 /bin/sh /u01/app/11.2.0/grid/bin/ocssd
    root  3997956        1   0   Oct 23      -  3:33 /u01/app/11.2.0/grid/bin/cssdmonitor

cssdmonitor => Monitors node hangs(via oprocd functionality) and monitors OCCSD process hangs (via oclsomon functionality) and monitors vendor clusterware(via vmon functionality).This is the multi threaded process that runs with elavated priority.

Startup sequence: INIT --> init.ohasd --> ohasd --> ohasd.bin --> cssdmonitor

cssdagent => Spawned by OHASD process.Previously(10g) oprocd, responsible for I/O fencing.Killing this process would cause node reboot.Stops,start checks the status of occsd.bin daemon

Startup sequence: INIT --> init.ohasd --> ohasd --> ohasd.bin --> cssdagent

occsd.bin => Manages cluster node membership runs as oragrid user.Failure of this process results in node restart.

Startup sequence: INIT --> init.ohasd --> ohasd --> ohasd.bin --> cssdagent --> ocssd --> ocssd.bin

iii) Event Management (EVM)

$ ps -ef | grep evm | grep -v grep
    grid  2294256        1   1   Oct 23      -  6:45 /u01/app/11.2.0/grid/bin/evmd.bin
    grid  4456732  2294256   0   Oct 23      -  0:07 /u01/app/11.2.0/grid/bin/evmlogger.bin -o /u01/app/11.2.0/grid/evm/log/evmlogger.info -l /u01/app/11.2.0/grid/evm/log/evmlogger.log

evmd.bin => Distributes and communicates some cluster events to all of the cluster members so that they are aware of the cluster changes.

evmlogger.bin => Started by EVMD.bin reads the configuration files and determines what events to subscribe to from EVMD and it runs user defined actions for those events.

iv).Oracle Root Agent

$ ps -ef | grep -v grep | grep orarootagent
    root  4849726        1   0   Oct 23      - 158:11 /u01/app/11.2.0/grid/bin/orarootagent.bin
    root  5898304        1   0   Oct 23      - 18:54 /u01/app/11.2.0/grid/bin/orarootagent.bin

orarootagent.bin => A specialized oraagent process that helps crsd manages resources owned by root, such as the network, and the Grid virtual IP address.

The above 2 process are actually threads which looks like processes. This is a Linux specific

v).Cluster Time Synchronization Service (CTSS)

ps -ef | grep ctss | grep -v grep
    root  5439972        1   0   Oct 23      - 10:59 /u01/app/11.2.0/grid/bin/octssd.bin reboot

octssd.bin => Provides Time Management in a cluster for Oracle Clusterware

vi) Oracle Agent

$ ps -ef | grep -v grep | grep oraagent
    grid  4719074        1   0   Oct 23      - 17:18 /u01/app/11.2.0/grid/bin/oraagent.bin
  oracle  9371946        1   0   Oct 23      - 25:26 /u01/app/11.2.0/grid/bin/oraagent.bin
    grid 11796808        1   0   Oct 23      - 16:55 /u01/app/11.2.0/grid/bin/oraagent.bin

oraagent.bin => Extends clusterware to support Oracle-specific requirements and complex resources. This process runs server callout scripts when FAN events occur. This process was known as RACG in Oracle Clusterware 11g Release 1 (11.1).


ORACLE HIGH AVAILABILITY SERVICES STACK

i) Cluster Logger Service

$ ps -ef | grep -v grep | grep ologgerd
    root  5046556        1   0   Oct 23      - 53:05 /u01/app/11.2.0/grid/bin/ologgerd -m ehdb02 -r -d /u01/app/11.2.0/grid/crf/db/ehdb01

ologgerd => Receives information from all the nodes in the cluster and persists in a CHM repository-based database. This service runs on only two nodes in a cluster

ii).System Monitor Service (osysmond)

$ ps -ef | grep -v grep | grep osysmond
    root  3867104        1   0   Oct 23      - 19:38 /u01/app/11.2.0/grid/bin/osysmond.bin

osysmond => The monitoring and operating system metric collection service that sends the data to the cluster logger service. This service runs on every node in a cluster

iii). Grid Plug and Play (GPNPD):

$ ps -ef | grep gpn
  oracle  9306330 11599954   0 16:49:41  pts/0  0:00 grep gpn
    grid  5505490        1   0   Oct 23      -  2:05 /u01/app/11.2.0/grid/bin/gpnpd.bin

gpnpd.bin => Provides access to the Grid Plug and Play profile, and coordinates updates to the profile among the nodes of the cluster to ensure that all of the nodes have the most recent profile.

iv).Grid Interprocess Communication (GIPC):

$ ps -ef | grep -v grep | grep gipc
    grid  4784526        1   0   Oct 23      - 12:55 /u01/app/11.2.0/grid/bin/gipcd.bin

gipcd.bin => A support daemon that enables Redundant Interconnect Usage.

v). Multicast Domain Name Service (mDNS):

$ ps -ef | grep -v grep | grep dns
    grid 11337848        1   0   Oct 23      -  0:07 /u01/app/11.2.0/grid/bin/mdnsd.bin

mdnsd.bin => Used by Grid Plug and Play to locate profiles in the cluster, as well as by GNS to perform name resolution. The mDNS process is a background process on Linux and UNIX and on Windows.

vi).Oracle Grid Naming Service (GNS)

$ ps -ef | grep -v grep | grep gns

gnsd.bin => Handles requests sent by external DNS servers, performing name resolution for names defined by the cluster.
Note: No output will come if gns is not configured.

I hope this doc will help you.

Oct 16, 2015

Best Practices of Backup and Recovery

My Best Practices of Backup and Recovery :

This document assumes that you are doing the Backup and Recovery basics. As per basic production setup, here are following requirements.

- Running in Archivelog mode
- Multiplexing the controlfile
- Taking regular backups
- Periodically doing a complete restore to test your procedures.
- Restore and recovery validate will not uncover nologging issues.
- Consider turning on force-logging if they need all transactions to be recovered, and not face nologging problems
 ( ALTER DATABASE FORCE LOGGING; )


1. Turn on block checking.

The aim is to detect, very early the presence of corrupt blocks in the database. This has a slight performance overhead, but will allow Oracle to detect early corruption caused by underlying disk, storage system, or I/O system problems.

SQL> alter system set db_block_checking = true scope=both;

2. Turn on Block Change Tracking when using RMAN incremental backups (10g and higher)

The Change Tracking File contains information that allows the RMAN incremental backup process to avoid reading data that has not been modified since the last backup. When Block Change Tracking is not used, all blocks must be read to determine if they have been modified since the last backup.

SQL> alter database enable block change tracking using file '/FRA/oradata/prod/change_tracking.f';


3. Duplex redo log groups and members and have more than one archive log destination.

If an archivelog is corrupted or lost, by having multiple copies in multiple locations, the other logs will still be available and could be used.

If an online log is deleted or becomes corrupt, you will have another member that can be used to recover if required.

SQL> alter system set log_archive_dest_2='location=/bkp/prod/archive2' scope=both;

SQL> alter database add logfile member '/u03/prod/redo21.log' to group 1;

Note: Multiple archivelog location setting into high IOPs based disks may degrade little-bit performance.

4. When backing up the database with RMAN use the CHECK LOGICAL option.

This will cause RMAN to check for logical corruption within a block, in addition to the normal checksum verification. This is the best way to ensure that you will get a good backup.

Click here to enable DB_BLOCK_CHECKSUM parameter.

RMAN> backup check logical database plus archivelog delete input;
OR
RMAN> backup as compressed backupset incremental level 0 check logical database plus archivelog delete input;

and the best one is:

RMAN> backup as compressed backupset incremental level 0 check logical database filesperset 1 plus archivelog;

Sample script:

run
{
allocate channel ch1 device type disk;
allocate channel ch2 device type disk;
backup as compressed backupset incremental level 0 check logical database plus archivelog;
release channel ch1;
release channel ch2;
}


Note: You may avoid "as compressed backupset" if you don't want to take compressed backup. Avoid "delete input" if you don't want to delete archivelogs after backup automatically.

5. Test your backups.

This will do everything except actually restore the database. This is the best method to determine if your backup is good and usable before being in a situation where it is critical and issues exist.

If using RMAN this can be done with:

RMAN> restore validate database;

6. When using RMAN have each datafile in a single backup piece

When doing a partial restore RMAN must read through the entire piece to get the datafile/archivelog requested. The smaller the backup piece the quicker the restore can complete. This is especially relevent with tape backups of large databases or where the restore is only on individual / few files.

However, very small values for filesperset will also cause larger numbers of backup pieces to be created, which can reduce backup performance and increase processing time for maintenance operations. So those factors must be weighed against the desired restore performance.

RMAN> backup database filesperset 1 plus archivelog delete input;

7. Maintain your RMAN catalog/controlfile

Choose your retention policy carefully. Make sure that it complements your tape subsystem retention policy, requirements for backup recovery strategy. If not using a catalog, ensure that your CONTROL_FILE_RECORD_KEEP_TIME parameter matches your retention policy.

SQL> alter system set control_file_record_keep_time=21 scope=both;

This will keep 21 days of backup records in the control file.

Follow Oracle Note 461125.1 - How to ensure that backup metadata is retained in the controlfile when setting a retention policy and an RMAN catalog is NOTused.

Run regular catalog maintenance.
REASON: Delete obsolete will remove backups that are outside your retention policy.
If obsolete backups are not deleted, the catalog will continue to grow until performance
becomes an issue.

RMAN> delete obsolete;

REASON: crosschecking will check that the catalog/controlfile matches the physical backups.
If a backup is missing, it will set the piece to 'EXPIRED' so when a restore is started,
that it will not be eligible, and an earlier backup will be used. To remove the expired
backups from the catalog/controlfile use the delete expired command.

RMAN> crosscheck backup;
RMAN> delete expired backup;

8. Prepare for loss of controlfiles.

This will ensure that you always have an up to date controlfile available that has been taken at the end of the current backup, rather then during the backup itself.

RMAN> configure controlfile autobackup on;

keep your backup logs.

REASON: The backup log contains parameters for your tape access, locations on controlfile backups
that can be utilised if complete loss occurs.

9. Test your recovery:

REASON: During a recovery situation this will let you know how the recovery will go without
actually doing it, and can avoid having to restore source datafiles again.

SQL> recover database prod;

Note: As per your recovery process, you proceed.

10. In RMAN backups do not specify 'delete all input' when backing up archivelogs:

REASON: Delete all input' will backup from one destination then delete both copies of the
archivelog where as 'delete input' will backup from one location and then delete what has
been backed up. The next backup will back up those from location 2 as well as new logs
from location 1, then delete all that are backed up. This means that you will have the
archivelogs since the last backup available on disk in location 2 (as well as backed up
once) and two copies backup up prior to the previous backup.

Note : Follow Oracle Doc ID 443814.1 to Manage multiple archive log destinations with RMAN

I hope this samll document may help for best practice for backup and recovery in RMAN.

.....

Translate >>