Optimize queries where column contains NULL values-
Can rewriting of query resolve issue for NULL value when used in where condition?
Query Performance Issue due to NVL(:b1,column_name) predicates in the WHERE clause. Due to these predicates, the Optimizer computed IncorrectCardinality and came out with a Sub-Optimal Plan.Issue can be solved by way of a workaround. Since, this query is a seeded query, the permanent fix (suggested in this blog) is expected by way of an Application Patch.
I will demonstrate this on my SCOTT Schema and a Query on EMP table. This will be easier for me to explain as well. Based on the EMP table, the requirement is to write a report that takes 2 Inputs. These are EMPNO and JOB. The users can run this report for any of the following conditions :
1.EMPNO and JOB are NOT NULL
2.EMPNO IS NULL and JOB IS NOT NULL
3.EMPNO IS NOT NULL AND JOB IS NULL
The way Original Query is written, I assumed the Developers had above 4 requirements in mind. However, at the production site, the customerconfirmed that only the 1st two conditions are applicable. Out of the total execution, 1st condition contributes to around 70% and 2ndcontributes to 30%.Back to our example on EMP table. With the 4 conditions in mind, any Developer would write a query as mentioned below.
select empno, ename, job, hiredate,deptno
from emp
where (empno=:b1 or :b1 is NULL)
and (JOB=:b2 OR :b2 is null);
## Execution of this query for each of the combination
## Both are NOT NULL (For a JOB & for an Employee)
select empno, ename, job, hiredate,deptno
from emp
where empno=1265 and JOB='MANAGER';
EMPNO ENAME JOB HIREDATE DEPTNO
--------- ---------- --------- ----------- ------
1265 GOURANGA MANAGER 07-Dec-2010 10
## EMPNO IS NULL (For a JOB and all Employees)
exec :JOB:=MANAGER; :empno:=null;
EMPNO ENAME JOB HIREDATE DEPTNO
--------- ---------- --------- ----------- ------
1265 GOURANGA MANAGER 07-Dec-2010 10
7566 JONES MANAGER 02-Apr-1981 20
7698 BLAKE MANAGER 01-May-1981 30
7782 CLARK MANAGER 09-Jun-1981
## JOB IS NULL (For an Employee)
exec :JOB:=null; :empno:=1265;
EMPNO ENAME JOB HIREDATE DEPTNO
--------- ---------- --------- ----------- ------
1265 GOURANGA MANAGER 07-Dec-2010 10
## Both are NULL (for all JOB and all Employees)
exec :JOB:=null; :empno:=null;
EMPNO ENAME JOB HIREDATE DEPTNO
----- ---------- --------- ----------- ------
1265 GOURANGA MANAGER 07-Dec-2010 10
7839 KING PRESIDENT 17-Nov-1981 10
7698 BLAKE MANAGER 01-May-1981 30
7782 CLARK MANAGER 09-Jun-1981 10
7566 JONES MANAGER 02-Apr-1981 20
7788 SCOTT ANALYST 19-Apr-1987 20
7902 FORD ANALYST 03-Dec-1981 20
7369 SMITH CLERK 17-Dec-1980 20
7499 ALLEN SALESMAN 20-Feb-1981 30
7521 WARD SALESMAN 22-Feb-1981 30
7654 MARTIN SALESMAN 28-Sep-1981 30
7844 TURNER SALESMAN 08-Sep-1981 30
7876 ADAMS CLERK 23-May-1987 20
7900 JAMES CLERK 03-Dec-1981 30
7934 MILLER CLERK 23-Jan-1982 10
A single query meets the requirement for all the 4 combinations. The Developer, in this case, has done his job. However, they have not considered the fact that 2 out of 4 of the above combinations would end up doing a Full Table Scan of EMP table. In case of the customer case, since only the 1st two combinations are applicable and with 30% of the executions on combination 2, 30% of the time, the Optimizer would opt for a Full Table Scan. Before, we get into the Original case, let us check the runtime execution plan for the Query on EMP Table.
SQL> connect scott
Enter password:
Connected.
SQL> explain plan for
2 select empno, ename, job, hiredate,deptno
from emp
where (empno=:b1 or :b1 is NULL)
and (JOB=:b2 OR :b2 is null);
Explained.
SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 3956160932
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 29 | 3 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| EMP | 1 | 29 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
1 - filter(("JOB"=:B2 OR :B2 IS NULL) AND (:B1 IS NULL OR
"EMPNO"=TO_NUMBER(:B1)))
14 rows selected.
SQL>
From the predicate information, the Optimizer choice becomes very clear, which is “IF :EMPNO is NULL then FTS of EMP and IF :EMPNO is NOT NULL then Table Access is also going for FTS. This means, for optimizer there is no way to take any predicate as option to use index.
Then, I re-written the query like below and most of cases optimizer used index. Have a look on execution plan:
SQL> set lines 120;
SQL> explain plan for
2 select empno, ename, job, hiredate,deptno
from emp
where empno=:b1 and (JOB=:b2 OR :b2 is null)
union
select empno, ename, job, hiredate,deptno
from emp
where JOB=:b2
and (empno=:b1 OR :b1 is null)
union
select empno, ename, job, hiredate,deptno
from emp
where JOB=:b2 and empno=:b1;
Explained.
SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------
Plan hash value: 4286588931
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 | 87 | 7 (86)| 00:00:01 |
| 1 | SORT UNIQUE | | 3 | 87 | 7 (86)| 00:00:01 |
| 2 | UNION-ALL | | | | | |
|* 3 | TABLE ACCESS BY INDEX ROWID| EMP | 1 | 29 | 1 (0)| 00:00:01 |
|* 4 | INDEX UNIQUE SCAN | IDX_EMPNO | 1 | | 0 (0)| 00:00:01 |
|* 5 | TABLE ACCESS BY INDEX ROWID| EMP | 1 | 29 | 2 (0)| 00:00:01 |
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------
|* 6 | INDEX RANGE SCAN | IDX_JOB | 11 | | 1 (0)| 00:00:01 |
|* 7 | TABLE ACCESS BY INDEX ROWID| EMP | 1 | 29 | 1 (0)| 00:00:01 |
|* 8 | INDEX UNIQUE SCAN | IDX_EMPNO | 1 | | 0 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter("JOB"=:B2 OR :B2 IS NULL)
4 - access("EMPNO"=TO_NUMBER(:B1))
5 - filter(:B1 IS NULL OR "EMPNO"=TO_NUMBER(:B1))
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------
6 - access("JOB"=:B2)
7 - filter("JOB"=:B2)
8 - access("EMPNO"=TO_NUMBER(:B1))
25 rows selected.
SQL>
In this case if user inputs :EMPNO, then optimzer use the index IDX_EMPNO. If user inputs :JOB, then optimizer use IDX_JOB. If both are not null then optimizer will cheaper plan and will use unique constraint based IDX_EMPNO index. Rather than these condition and rare cases optimize use FTS when both are NULL.
Here developer may think why more UNION, it may affect the performance or it may create other issue or side effect for the whole application.
Actually when we see the above execution plan ( Plan hash value: 4286588931), there is no issue with optimer and database. Query may looks bit complex but no other issues.
In real-time scenario, you may apply this and have patience while write big-big queries.
Another simpler way you can write in pl-sql sub-programs with implementing if-else structure.
Thanks.
Can rewriting of query resolve issue for NULL value when used in where condition?
Query Performance Issue due to NVL(:b1,column_name) predicates in the WHERE clause. Due to these predicates, the Optimizer computed IncorrectCardinality and came out with a Sub-Optimal Plan.Issue can be solved by way of a workaround. Since, this query is a seeded query, the permanent fix (suggested in this blog) is expected by way of an Application Patch.
I will demonstrate this on my SCOTT Schema and a Query on EMP table. This will be easier for me to explain as well. Based on the EMP table, the requirement is to write a report that takes 2 Inputs. These are EMPNO and JOB. The users can run this report for any of the following conditions :
1.EMPNO and JOB are NOT NULL
2.EMPNO IS NULL and JOB IS NOT NULL
3.EMPNO IS NOT NULL AND JOB IS NULL
The way Original Query is written, I assumed the Developers had above 4 requirements in mind. However, at the production site, the customerconfirmed that only the 1st two conditions are applicable. Out of the total execution, 1st condition contributes to around 70% and 2ndcontributes to 30%.Back to our example on EMP table. With the 4 conditions in mind, any Developer would write a query as mentioned below.
select empno, ename, job, hiredate,deptno
from emp
where (empno=:b1 or :b1 is NULL)
and (JOB=:b2 OR :b2 is null);
## Execution of this query for each of the combination
## Both are NOT NULL (For a JOB & for an Employee)
select empno, ename, job, hiredate,deptno
from emp
where empno=1265 and JOB='MANAGER';
EMPNO ENAME JOB HIREDATE DEPTNO
--------- ---------- --------- ----------- ------
1265 GOURANGA MANAGER 07-Dec-2010 10
## EMPNO IS NULL (For a JOB and all Employees)
exec :JOB:=MANAGER; :empno:=null;
EMPNO ENAME JOB HIREDATE DEPTNO
--------- ---------- --------- ----------- ------
1265 GOURANGA MANAGER 07-Dec-2010 10
7566 JONES MANAGER 02-Apr-1981 20
7698 BLAKE MANAGER 01-May-1981 30
7782 CLARK MANAGER 09-Jun-1981
## JOB IS NULL (For an Employee)
exec :JOB:=null; :empno:=1265;
EMPNO ENAME JOB HIREDATE DEPTNO
--------- ---------- --------- ----------- ------
1265 GOURANGA MANAGER 07-Dec-2010 10
## Both are NULL (for all JOB and all Employees)
exec :JOB:=null; :empno:=null;
EMPNO ENAME JOB HIREDATE DEPTNO
----- ---------- --------- ----------- ------
1265 GOURANGA MANAGER 07-Dec-2010 10
7839 KING PRESIDENT 17-Nov-1981 10
7698 BLAKE MANAGER 01-May-1981 30
7782 CLARK MANAGER 09-Jun-1981 10
7566 JONES MANAGER 02-Apr-1981 20
7788 SCOTT ANALYST 19-Apr-1987 20
7902 FORD ANALYST 03-Dec-1981 20
7369 SMITH CLERK 17-Dec-1980 20
7499 ALLEN SALESMAN 20-Feb-1981 30
7521 WARD SALESMAN 22-Feb-1981 30
7654 MARTIN SALESMAN 28-Sep-1981 30
7844 TURNER SALESMAN 08-Sep-1981 30
7876 ADAMS CLERK 23-May-1987 20
7900 JAMES CLERK 03-Dec-1981 30
7934 MILLER CLERK 23-Jan-1982 10
A single query meets the requirement for all the 4 combinations. The Developer, in this case, has done his job. However, they have not considered the fact that 2 out of 4 of the above combinations would end up doing a Full Table Scan of EMP table. In case of the customer case, since only the 1st two combinations are applicable and with 30% of the executions on combination 2, 30% of the time, the Optimizer would opt for a Full Table Scan. Before, we get into the Original case, let us check the runtime execution plan for the Query on EMP Table.
SQL> connect scott
Enter password:
Connected.
SQL> explain plan for
2 select empno, ename, job, hiredate,deptno
from emp
where (empno=:b1 or :b1 is NULL)
and (JOB=:b2 OR :b2 is null);
Explained.
SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 3956160932
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 29 | 3 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| EMP | 1 | 29 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
1 - filter(("JOB"=:B2 OR :B2 IS NULL) AND (:B1 IS NULL OR
"EMPNO"=TO_NUMBER(:B1)))
14 rows selected.
SQL>
From the predicate information, the Optimizer choice becomes very clear, which is “IF :EMPNO is NULL then FTS of EMP and IF :EMPNO is NOT NULL then Table Access is also going for FTS. This means, for optimizer there is no way to take any predicate as option to use index.
Then, I re-written the query like below and most of cases optimizer used index. Have a look on execution plan:
SQL> set lines 120;
SQL> explain plan for
2 select empno, ename, job, hiredate,deptno
from emp
where empno=:b1 and (JOB=:b2 OR :b2 is null)
union
select empno, ename, job, hiredate,deptno
from emp
where JOB=:b2
and (empno=:b1 OR :b1 is null)
union
select empno, ename, job, hiredate,deptno
from emp
where JOB=:b2 and empno=:b1;
Explained.
SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------
Plan hash value: 4286588931
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 | 87 | 7 (86)| 00:00:01 |
| 1 | SORT UNIQUE | | 3 | 87 | 7 (86)| 00:00:01 |
| 2 | UNION-ALL | | | | | |
|* 3 | TABLE ACCESS BY INDEX ROWID| EMP | 1 | 29 | 1 (0)| 00:00:01 |
|* 4 | INDEX UNIQUE SCAN | IDX_EMPNO | 1 | | 0 (0)| 00:00:01 |
|* 5 | TABLE ACCESS BY INDEX ROWID| EMP | 1 | 29 | 2 (0)| 00:00:01 |
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------
|* 6 | INDEX RANGE SCAN | IDX_JOB | 11 | | 1 (0)| 00:00:01 |
|* 7 | TABLE ACCESS BY INDEX ROWID| EMP | 1 | 29 | 1 (0)| 00:00:01 |
|* 8 | INDEX UNIQUE SCAN | IDX_EMPNO | 1 | | 0 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter("JOB"=:B2 OR :B2 IS NULL)
4 - access("EMPNO"=TO_NUMBER(:B1))
5 - filter(:B1 IS NULL OR "EMPNO"=TO_NUMBER(:B1))
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------
6 - access("JOB"=:B2)
7 - filter("JOB"=:B2)
8 - access("EMPNO"=TO_NUMBER(:B1))
25 rows selected.
SQL>
In this case if user inputs :EMPNO, then optimzer use the index IDX_EMPNO. If user inputs :JOB, then optimizer use IDX_JOB. If both are not null then optimizer will cheaper plan and will use unique constraint based IDX_EMPNO index. Rather than these condition and rare cases optimize use FTS when both are NULL.
Here developer may think why more UNION, it may affect the performance or it may create other issue or side effect for the whole application.
Actually when we see the above execution plan ( Plan hash value: 4286588931), there is no issue with optimer and database. Query may looks bit complex but no other issues.
In real-time scenario, you may apply this and have patience while write big-big queries.
Another simpler way you can write in pl-sql sub-programs with implementing if-else structure.
Thanks.
No comments:
Post a Comment