When a query is submitted to DB, it is parsed and translated to relational algebra. It is verified for its validity and correctness. Once it passes this stage, different ways of evaluating the query is generated. It is checked for various factors and its execution plan is generated. It may be based on cost of the query or based on the equivalence rules. Once cost based execution and rule based execution plans are generated, optimizer has to decide, which plan to be selected for evaluation. This is the most important step in processing a query.
As we have seen in other articles, the cost or the heuristic execution plan may not be always effective in all the tables with same type of query. They are all general guidelines to evaluate a query. There are lot many factors affecting the performance of a query. The evaluation plans exactly defines the system which plan / algorithm is to be used to evaluate, which index to be used etc.
SELECT EMP_ID, DEPT_NAME FROM EMP, DEPT WHERE EMP.DEPT_ID = DEPT.DEPT_ID AND EMP.DEPT_ID = 10 AND EMP.EMP_LAST_NAME = ‘Joseph’ ;
Above query selects the EMP_ID and DEPT_NAME from EMP and DEPT table for DEPT_ID = 10 with employee’s last name as ‘Joseph’. But when it is given to the DBMS, it divides the query into tokens and sees how it can be put together so that performance will be better. This is the duty of query optimizer. But altering the order of tokens in the query should not change the result. In either way it should give same result. Order of records can change and are least important. This is called equivalent query. There is set of rules to put tokens in the query. This is called equivalence rule.
Both the steps above are same irrespective of how it is performed. Hence both are called equivalent query. These are not written in SQL, but using relational algebra, or tree.
∏ EMP_ID, DEPT_NAME (σ DEPT_ID = 10 AND EMP_LAST_NAME = ‘Joseph’ (EMP) ∞DEPT)
∏ EMP_ID, DEPT_NAME (σ DEPT_ID = 10 AND EMP_LAST_NAME = ‘Joseph’ (EMP ∞DEPT))
σ DEPT_ID = 10 AND EMP_LAST_NAME = ‘Joseph’ (∏ EMP_ID, DEPT_NAME, DEPT_ID (EMP ∞DEPT))
The optimizer can produce relational expression and tree in above three formats. According to evaluation rule, the first query seems to be the best one. But considering other factors of tables, other plans may also be efficient. Let us see what are other factors affecting the performance and how a ideal execution plan is determined.