Histograms tell the Optimizer about the distribution of data within a column. Without a histogram, the Optimizer assumes a uniform distribution of rows across the distinct values in a column.
A histogram is a special type of column statistic that provides more detailed information about the data distribution in a table column. A histogram sorts values into "buckets," as you might sort coins into buckets. Based on the NDV and the distribution of the data, the database chooses the type of histogram to create.
Example:
Assume there is a table named JET_ENGINE having 10,000 records and it has Column names SALES_COUNTRY. This column stores information about how many JET_ENGINEs are sold per country. Assume that JET_ENGINEs are sold only to three countries: India, US and UK.
Assume that 9000 JET_ENGINEs were sold to India, 900 to US and 100 to UK.
Column Name: SALES_COUNTRY
You ran a SQL query to find out how much JET_ENGINEs were sold to UK.
WITHOUT HISTOGRAM
Optimizer will calculates the cardinality (number of rows returned) by dividing the total number of rows in the table by the number of distinct values in the column.
Cardinality= Total_records/number_of_distinct _values= 10000/3= 3333
As the data distribution in SALES_COUNTRY column is not uniform so cardinality estimate is showing up as 3333 while for UK we have only 100 records. Keeping such high cardinality in mind, Oracle may try to use the full scan here and query cost will be high.
WITH HISTOGRAM
The presence of a histogram changes the formula used by the Optimizer to estimate the cardinality, and allows it to generate a more accurate execution plan. So mostly in above case Oracle will look at the histogram and choose a low-cost index range scan (if available).
TYPES OF HISTOGRAMS:
1. Height – balanced Histograms : A height-based histogram is used when there are more distinct values than the number of buckets. In this type, the column values are divided into bands so that each band contains approximately the same number of rows. In this histogram type values will be evenly spread through the buckets.
2. Frequency Histograms : This type of Histograms are more precise and are more widely used by Oracle. Each value of the column corresponds to a single bucket of the histogram. This is also called value based histogram. Each bucket contains the number of occurrences of that single value only. Frequency histograms are automatically created when the number of distinct values is less than or equal to the number of histogram buckets specified.
Oracle 12c also introduced hybrid histogram which combines characteristics of both height-based histograms and frequency histograms and also Top-frequency histograms which are a variant on the frequency histogram, where the histogram only focuses on the popular values, ignoring the less popular values as statistically insignificant.
HOW TO VERIFY IF YOUR TABLE IS USING HISTOGRAM
You can query DBA_TAB_COL_STATISTICS or USER_TAB_COL_STATISTICS view to get this information easily.
e.g.,
from DBA_TAB_COL_STATISTICS
where owner='TRANSACTION' and table_name='ACCOUNTS';
KEY DATABASE PARAMETERS RELATED TO HISTOGRAMS
METHOD_OPT: This is the parameter which tells Oracle Engine to create histogram while collecting the statistics.
The default value of METHOD_OPT from 10g onwards is ‘FOR ALL COLUMNS SIZE AUTO’.
Below are possible values for the METHOD
AUTO: Oracle determines the columns to collect histograms based on data distribution and the workload of the columns.
REPEAT: Collects histograms only on the columns that already have histograms.
SKEWONLY : Oracle determines the columns to collect histograms based on the data distribution of the columns.
INTEGER VALUE: Number of histogram buckets. Any value between 1 -254. Note that setting 1 will disable the Histogram collection
This basically implies that Oracle will automatically create histograms on those columns which have skewed data distribution and there are SQL statements referencing those columns.
TO VERIFY YOUR CURRENT SETTING
DBMS_STATS.GET_PREFS('METHOD_OPT')
---------------------------------------------------------------------
FOR ALL COLUMNS SIZE AUTO
TO CHANGE YOUR SETTING TO “REPEAT”
SQL> exec dbms_stats.set_global_prefs ('METHOD_OPT','FOR ALL COLUMNS SIZE REPEAT');
TO DISABLE THE HISTOGRAM COLLECTION
SQL> exec dbms_stats.set_global_prefs ('METHOD_OPT','FOR ALL COLUMNS SIZE 1');
HOW TO CREATE / DELETE HISTOGRAMS
If DBMS_STATS gathers statistics is used for a table, and if the queries have referenced the columns in this table, then Oracle Database creates histograms automatically as needed. Oracle engine updates the data dictionary table SYS.COL_USAGE$ regularly based on user queries running in the database.
You can simply let Oracle collect all the histograms it wants and then drop any of histograms that cause you performance problem and then set the preference to tell the optimizer not to gather it again.
The following command drops the histogram on the EMP_ID column:
dbms_stats.Delete_column_stats(ownname => 'SCOTT', tabname => 'EMP',
colname => 'EMP_ID', col_stat_type => 'HISTOGRAM');
END;
/
The following command prevents the Optimizer from creating a histogram further on the EMP_ID column in the EMP table:
dbms_stats.Set_table_prefs('SCOTT', 'EMP',
'METHOD_OPT'=>'FOR ALL COLUMNS SIZE AUTO, FOR COLUMNS SIZE 1 EMP_ID');
END;
/
WHEN TO CREATE HISTOGRAMS
Create Histograms in below situation:
2) AND also when the column is highly skewed.
Note that Histograms are not just for indexed column but also for the non-indexed columns
Do not create Histograms in below situation:
2) Column is not used in a where clause.
3) Do not create on every column of the table as it will increase the stats collection time, parsing time and also can cause optimize to generate bad plans.
4) Do not create Histograms on primary key.
5) Do not create histogram when the column is unique and is used only with equality predicates
TIPS FOR HISTOGRAMS
2) Histograms are stored in the dictionary and maintenance and space cost is there. So create histograms only when required.
3) Histograms are not gathered on columns for which there is no predicate information in the col_usage$ table.
4) col_usage$ is only populated when columns referenced in predicates and hard parsing occurred.
No comments:
Post a Comment