Jun 3, 2015

Query Optimization - sql tunig

Optimize queries where column contains NULL values-
Can rewriting of query resolve issue for NULL value when used in where condition?

Query Performance Issue due to NVL(:b1,column_name) predicates in the WHERE clause. Due to these predicates, the Optimizer computed IncorrectCardinality and came out with a Sub-Optimal Plan.Issue can be solved by way of a workaround. Since, this query is a seeded query, the permanent fix (suggested in this blog) is expected by way of an Application Patch.

I will demonstrate this on my SCOTT Schema and a Query on EMP table. This will be easier for me to explain as well. Based on the EMP table, the requirement is to write a report that takes 2 Inputs. These are EMPNO and JOB. The users can run this report for any of the following conditions :

1.EMPNO and JOB are NOT NULL
2.EMPNO IS NULL and JOB IS NOT NULL
3.EMPNO IS NOT NULL AND JOB IS NULL


The way Original Query is written, I assumed the Developers had above 4 requirements in mind. However, at the production site, the customerconfirmed that only the 1st two conditions are applicable. Out of the total execution, 1st condition contributes to around 70% and 2ndcontributes to 30%.Back to our example on EMP table. With the 4 conditions in mind, any Developer would write a query as mentioned below.

select empno, ename, job, hiredate,deptno
from   emp
where  (empno=:b1 or :b1 is NULL)
and  (JOB=:b2 OR :b2 is null);

## Execution of this query for each of the combination
## Both are NOT NULL (For a JOB & for an Employee)

select empno, ename, job, hiredate,deptno
from   emp
where empno=1265 and JOB='MANAGER';


EMPNO ENAME      JOB       HIREDATE    DEPTNO
--------- ---------- --------- ----------- ------
1265 GOURANGA   MANAGER   07-Dec-2010     10

## EMPNO IS NULL (For a JOB and all Employees)
exec :JOB:=MANAGER; :empno:=null;

    EMPNO ENAME      JOB       HIREDATE    DEPTNO
--------- ---------- --------- ----------- ------
     1265 GOURANGA   MANAGER   07-Dec-2010     10
     7566 JONES      MANAGER   02-Apr-1981     20
     7698 BLAKE      MANAGER   01-May-1981     30
     7782 CLARK      MANAGER   09-Jun-1981


## JOB IS NULL (For an Employee)
exec :JOB:=null; :empno:=1265;

EMPNO ENAME      JOB       HIREDATE    DEPTNO
--------- ---------- --------- ----------- ------
1265 GOURANGA   MANAGER   07-Dec-2010     10

## Both are NULL (for all JOB and all Employees)
exec :JOB:=null; :empno:=null;

EMPNO ENAME      JOB       HIREDATE    DEPTNO
----- ---------- --------- ----------- ------
 1265 GOURANGA   MANAGER   07-Dec-2010     10
 7839 KING       PRESIDENT 17-Nov-1981     10
 7698 BLAKE      MANAGER   01-May-1981     30
 7782 CLARK      MANAGER   09-Jun-1981     10
 7566 JONES      MANAGER   02-Apr-1981     20
 7788 SCOTT      ANALYST   19-Apr-1987     20
 7902 FORD       ANALYST   03-Dec-1981     20
 7369 SMITH      CLERK     17-Dec-1980     20
 7499 ALLEN      SALESMAN  20-Feb-1981     30
 7521 WARD       SALESMAN  22-Feb-1981     30
 7654 MARTIN     SALESMAN  28-Sep-1981     30
 7844 TURNER     SALESMAN  08-Sep-1981     30
 7876 ADAMS      CLERK     23-May-1987     20
 7900 JAMES      CLERK     03-Dec-1981     30
 7934 MILLER     CLERK     23-Jan-1982     10

A single query meets the requirement for all the 4 combinations. The Developer, in this case, has done his job. However, they have not considered the fact that 2 out of 4 of the above combinations would end up doing a Full Table Scan of EMP table. In case of the customer case, since only the 1st two combinations are applicable and with 30% of the executions on combination 2, 30% of the time, the Optimizer would opt for a Full Table Scan. Before, we get into the Original case, let us check the runtime execution plan for the Query on EMP Table.


SQL> connect scott
Enter password:
Connected.
SQL> explain plan for
  2  select empno, ename, job, hiredate,deptno
from   emp
where  (empno=:b1 or :b1 is NULL)
and  (JOB=:b2 OR :b2 is null);

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 3956160932

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |    29 |     3   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| EMP  |     1 |    29 |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------

   1 - filter(("JOB"=:B2 OR :B2 IS NULL) AND (:B1 IS NULL OR
              "EMPNO"=TO_NUMBER(:B1)))

14 rows selected.

SQL>

From the predicate information, the Optimizer choice becomes very clear, which is “IF :EMPNO is NULL then FTS of EMP and IF :EMPNO is NOT NULL then Table Access is also going for FTS. This means, for optimizer there is no way to take any predicate as option to use index.

Then, I re-written the query like below and most of cases optimizer used index. Have a look on execution plan:

SQL> set lines 120;
SQL> explain plan for
  2  select empno, ename, job, hiredate,deptno
from   emp
where  empno=:b1 and  (JOB=:b2 OR :b2 is null)
union
select empno, ename, job, hiredate,deptno
from   emp
where  JOB=:b2
and  (empno=:b1 OR :b1 is null)
union
select empno, ename, job, hiredate,deptno
from   emp
where  JOB=:b2 and empno=:b1;

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------
Plan hash value: 4286588931

-------------------------------------------------------------------------------------------
| Id  | Operation                     | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |           |     3 |    87 |     7  (86)| 00:00:01 |
|   1 |  SORT UNIQUE                  |           |     3 |    87 |     7  (86)| 00:00:01 |
|   2 |   UNION-ALL                   |           |       |       |            |          |
|*  3 |    TABLE ACCESS BY INDEX ROWID| EMP       |     1 |    29 |     1   (0)| 00:00:01 |
|*  4 |     INDEX UNIQUE SCAN         | IDX_EMPNO |     1 |       |     0   (0)| 00:00:01 |
|*  5 |    TABLE ACCESS BY INDEX ROWID| EMP       |     1 |    29 |     2   (0)| 00:00:01 |

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------
|*  6 |     INDEX RANGE SCAN          | IDX_JOB   |    11 |       |     1   (0)| 00:00:01 |
|*  7 |    TABLE ACCESS BY INDEX ROWID| EMP       |     1 |    29 |     1   (0)| 00:00:01 |
|*  8 |     INDEX UNIQUE SCAN         | IDX_EMPNO |     1 |       |     0   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("JOB"=:B2 OR :B2 IS NULL)
   4 - access("EMPNO"=TO_NUMBER(:B1))
   5 - filter(:B1 IS NULL OR "EMPNO"=TO_NUMBER(:B1))

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------
   6 - access("JOB"=:B2)
   7 - filter("JOB"=:B2)
   8 - access("EMPNO"=TO_NUMBER(:B1))

25 rows selected.

SQL>

In this case if user inputs :EMPNO, then optimzer use the index IDX_EMPNO. If user inputs :JOB, then optimizer use IDX_JOB. If both are not null then optimizer will cheaper plan and will use unique constraint based IDX_EMPNO index. Rather than these condition and rare cases optimize use FTS when both are NULL.

Here developer may think why more UNION, it may affect the performance or it may create other issue or side effect for the whole application.

Actually when we see the above execution plan ( Plan hash value: 4286588931), there is no issue with optimer and database. Query may looks bit complex but no other issues.

In real-time scenario, you may apply this and have patience while write big-big queries.

Another simpler way you can write in pl-sql sub-programs with implementing if-else structure.

Thanks.


No comments:

Post a Comment

Translate >>