Jun 6, 2017

Use of "METHOD_OPT" in gather statistics - Best Practices

How to use "METHOD_OPT" parameter during gather statistics? What performance benefits we will have? When to use this option? 

During training session on performance tuning, I faced these questions from many DBAs.

With release 11g there are some enhancements in gather statistics. By far the most controversial parameter in the DBMS_STATS.GATHER_*_STATS procedures is the METHOD_OPT parameter.

The METHOD_OPT parameter controls the creation of histograms during statistics collection. Histograms are a special type of column statistic created to provide more detailed information on the data distribution in a table column.

So why are histograms such a controversial issue?
Histogram creation does extend the elapse time and the system resources needed for statistics collection but the far bigger concerns people have with histograms comes from their interaction with the bind peeking feature and how their presence affects the cardinality estimates for near popular values.

The purpose of method_opt: 

The method_opt argument within dbms_stats controls the following:

The generation of histograms
The creation of extended statistics  (Oracle 11g)
The collection of "base" column statistics

Let's look at some of the many permutations of the method_opt clause.

The recommendation from Oracle Database11g onwards is to let METHOD_OPT default and to take advantage of Adaptive Cursor Sharing.

If you plan to manually set the METHOD_OPT parameter to a non-default value ensure you specify only the columns that really need a histogram. Setting METHOD_OPT to FOR ALL COLUMNS SIZE 254 will cause Oracle to gather a histogram on every column. This will unnecessarily extend the elapse time and the system resources needed for statistics gathering, as well as increasing the amount of space required to store the statistics.

You should also refrain from setting METHOD_OPT to FOR ALL INDEX COLUMNS SIZE 254 as this will cause Oracle to gather histograms on every column used in an index, which again could waste system resources. This setting also has a nasty side effect of preventing Oracle from collecting basic column statistics for non-index columns.

The default method_opt "For all columns size auto":

Remember, it is wasteful to create a histogram on a column that is never referenced by a SQL statement.  It is the SQL workload that determines if a column needs histograms.

The default value of  "for all columns size auto" is the Oracle default and this option will analyze histograms on what the optimizer considers to be "important columns".  The optimizer does this by examining your current workload when you run the dbms_stats command, interrogating all current SQL in the library cache to see which columns might benefit from a histogram.
e.g.,
begin
   dbms_stats.gather_schema_stats(
      ownname          => 'HR',
      estimate_percent => dbms_stats.auto_sample_size,
      method_opt       => 'for all columns size auto',
      degree           => 4 -- optional
   );
end;
/

The method "for all indexed columns":

 The method "for all indexed columns" limits base column gathering to only those columns that are included in an index. This value is not recommended as it is highly unlikely that only index columns will be used in the select list, where clause predicates, and group by clause of all of the SQL statement executed in the environment.

The method_opt='SKEWONLY'  dbms_stats Option:

The first is the 'skewonly' option which very time-intensive because it examines the distribution of values for every column within every index.

If dbms_stats discovers an index whose columns are unevenly distributed, it will create histograms for that index to aid the cost-based SQL optimizer in making a decision about index vs. full-table scan access.

For example, if an index has one column that is in 50% of the rows, a full-table scan is faster than and index scan to retrieve these rows.

Histograms are also used with SQL that has bind variables and SQL with cursor_sharing enabled.  In these cases, the CBO determines if the column value could affect the execution plan, and if so, replaced the bind variable with a literal and performs a hard parse.

e.g.,
begin
   dbms_stats.gather_schema_stats(
      ownname          => 'HR',
      estimate_percent => dbms_stats.auto_sample_size,
      method_opt       => 'for all columns size skewonly',
      degree           => 7
   );
end;
/

The method_opt='REPEAT'  dbms_stats Option:

Following the one-time detailed analysis, the re-analyze task will be less resource intensive with the REPEAT option.  Using the repeat option will only re-analyze indexes with existing histograms, and will not search for other histograms opportunities.  This is the way that you will re-analyze you statistics on a regular basis.

e.g.,

begin
   dbms_stats.gather_schema_stats(
      ownname          => 'HR',
      estimate_percent => dbms_stats.auto_sample_size,
      method_opt       => 'for all columns size repeat',
      degree           => 7
   );
end;
/

The method_opt='AUTO' dbms_stats Option:

The auto option is used when monitoring is implemented and creates histograms based upon data distribution and the manner in which the column is accessed by the application (e.g. the workload on the column as determined by monitoring, especially foreign keys to determine the cardinality of table join result sets).  Using method_opt=>'auto' is similar to using the gather auto in the option parameter of dbms_stats.

begin
   dbms_stats.gather_schema_stats(
      ownname          => 'SCOTT',
      estimate_percent => dbms_stats.auto_sample_size,
      method_opt       => 'for all columns size auto',
      degree           => 7
   );
end;
/

Important:  Whenever you have almost all of your SQL using bind variables, you will want to delete your existing table statistics and then change the method_opt to prevent future histogram creation.  To delete the histograms that were created with method_opt, just remove and re-analyze the tables using dbms_stats, delete_table_stats.

To prevent future generation of histograms, invoke dbms_stats.set_param as follows

SQL> exec dbms_stats.set_param(pname=>'METHOD_OPT', pval=>'FOR ALL COLUMNS SIZE 1');

Remember, analyzing for histograms is time-consuming, and histograms are used under two conditions:

Table join order : The CBO must know the size of the intermediate result sets (cardinality) to properly determine the correct join order the multi-table joins.

Table access method : The CBO needs to know about columns in SQL where clauses, where the column value is skewed such that a full-table scan might be faster than an index range scan. Oracle uses this skew information in conjunction with the clustering_factor columns of the dba_indexes view.

Hence, this is the proper order for using the dbms_stats package to locate proper columns for histograms:

1. Skewonly option - You want to use skewonly to do histograms for skewed columns, for cases where the value will make a difference between a full-table scan and an index scan.

2. Monitor - Next, turn-on monitoring. Issue an 'alter table xx monitoring' and 'alter index yyy monitoring' command for all segments in your schema. This will monitor workload.

3. Auto option - Once monitoring is in-place, you need to re-analyze with the "auto" option to create histograms for join columns within tables. This is critical for the CBO to determine the proper join order for finding the driving table in multi-table joins.

4. Repeat option - Finally, use the "repeat" option to re-analyze only the existing histograms.
Periodically you will want to re-run the skewonly and auto option to identify any new columns that require histograms. Once located, the repeat option will ensure that they are refreshed with current values.

I think I like the first method better, because the statistics will still exist even if the index is dropped and unlike the second approach, it doesn't change the logical structure of the table.

Best Practices:

1) Always gather stats with AUTO_SAMPLE_SIZE so the new stats gathering algorithm of Oracle 11g can kick in.
2) If you use partition tables you must use synopsis so the global stats can be derived much faster in an incremental fashion.

Synopsis are created when you enable incremental stats at the table or system level:

Table level: 
SQL> exec DBMS_STATS.SET_TABLE_PREFS('HR','EMP','INCREMENTAL','TRUE');

System level: 
SQL> exec DBMS_STATS.SET_GLOBAL_PREFS('INCREMENTAL','TRUE');

This will avoid re-scanning partitions that haven’t changed since the last stats generation, making it faster.

Note:
To gather big schema use parallelism like below to reduce time:
e.g.,
exec dbms_stats.gather_schema_stats(ownname => 'HR',estimate_percent => 100,degree => 5); 

No comments:

Post a Comment

Translate >>