AN

Query Processing & Optimization Notes

Query Processing & Optimization ## Introduction * Query processing and optimization are crucial for database management systems (DBMS). * Efficiency in query processing is vital for DBMS performance. * Query optimization aims to select an efficient execution plan, minimizing runtime and resource usage (disk I/O, CPU time, etc.). ## Query Processing * Query processing transforms high-level queries into efficient execution plans. * The query processor selects the best plan to respond to database requests. * When a database system receives a query, it undergoes query compilation steps to create an execution plan before execution. * It is a stepwise process. ### Example Query * Consider the query: "Find the names of all customers who have an account at any branch located in Brooklyn." * Relational-algebra expression: П_{customer\_name} (σ_{branch\_city="Brooklyn"} (branch ⋈ (account ⋈ depositor))) * Optimized expression: П_{customer\_name} ((σ_{branch\_city="Brooklyn"} (branch)) ⋈ (account ⋈ depositor)) ## Generation of Query-Evaluation Plans * Involves two steps: 1. Generating logically equivalent expressions. 2. Estimating the cost of each evaluation plan. * Equivalence rules transform expressions into logically equivalent ones. ### Equivalence Rules * Define how to transform an expression into a logically equivalent one. * An equivalence rule states that expressions of two forms are equivalent. We can replace the first form by the second, or vice versa. ### Pictorial Representation of Equivalences: * Rule no. 1: E1 \equiv E2 * Rule no. 2: E1 \equiv E2 E1 \equiv E3 * Rule no. 3: E1 \equiv E2 ## Query Processing Steps * General query (e.g., QBE) transformed into high-level query language (e.g., SQL). * **Users** interact with the **Database** through a **Catalog**. ### Actions 1. **Scanning, Parsing, Validating:** * **Input:** High-level query language (SQL). * **Process:** Syntax analysis and verification by the parser. * **Output:** Correct Query. 2. **Query Decomposer:** * **Process:** Translation of SQL to relational algebra using equivalency rules, idem-potency rules, transformation rules, etc. * **Output:** Algebraic expression. 3. **Query Optimizer:** * **Process:** Optimization by substituting equivalent expressions. * **Cost Module** and **Statistical data** are inputs. * **Output:** Execution (action) Plan. 4. **Query Code Generator:** * **Process:** Generating code for the queries. * **Output:** Code to execute Query. 5. **Runtime Database Processor:** * **Process:** Estimation of each access plan, selecting optimal plan and execution. * **Output:** Query Result. ## Syntax Analyzer * Parses the query into tokens and verifies compliance with language grammar. * Rejects queries with errors and returns an error code with explanation. ### Simple Form of Grammar * QUERY: = SELECT CLAUSE + FROM CLAUSE + WHERE CLAUSE * SELECT CLAUSE: = "SELECT +" * FROM\_CLAUSE: = 'FROM'+ * WHERE CLAUSE: = 'WHERE'+VALUE1 OP VALUE2 * VALUE1: =VALUE/COLUMN\_NAME * VALUE2: =VALUE/COLUMN\_NAME * OP: =+,-,/,* ### Example ```sql SELECT COLUMN1, COLUMN2, COLUMN3, COLUMN4 FROM TEST1 WHERE COLUMN2 > 50000 AND COLUMN3 = 'DELHI' AND COLUMN4 BETWEEN 10000 AND 80000 ``` ## Query Decomposition * Transforms high-level queries into relational algebra queries and checks for syntactic and semantic correctness. * Transforms into a query graph of low-level operations. * Decomposed into query blocks (basic units). * Nested queries are identified as separate query blocks. ### Steps 1. **SQL (Relational Calculus Query)** 2. **Query Analysis** 3. **Query Normalization** 4. **Semantic Analysis** 5. **Query Simplifier** 6. **Query Restructuring** 7. **Algebraic Expressions (Relational algebra query)** ## Query Analysis * Lexical and syntactical analysis similar to programming language compilers. * Validates the query, ensuring database objects (relations, attributes) are defined in the database. * Verifies the correctness of relationships between attributes and relations. ### Query Tree * Example query tree: ## Query Normalization * Avoids redundancy. * Converts the query into a normalized form for easier manipulation. * Simplifies projection and selection operations to avoid redundancy. ### Normal Forms * **Conjunctive Normal Form (CNF):** Sequence of conjuncts connected by 'AND'. Each conjunct contains terms connected by 'OR'. * Example: (A OR B) AND (A OR C) * **Disjunctive Normal Form (DNF):** Sequence of disjuncts connected by 'OR'. Each disjunct contains terms connected by 'AND'. * Example: (A AND B) OR (A AND C) ## Semantic Analyzer * Reduces the number of predicates by refuting incorrect or contradictory queries. * Rejects incorrectly formulated or contradictory normalized queries. * A query is incorrectly formulated if components do not contribute to the result. * A query is contradictory if its predicate cannot be satisfied by any tuple. ## Query Simplifier * Detects redundant qualifications, eliminates common sub-expressions, and transforms sub-graphs to more efficiently computed forms. * Integrity constraints, view definitions, and access restrictions are introduced. * Applies idempotence rules of Boolean Algebra. ### Idempotence Rules of Boolean Algebra | Rule | Description | Rule Format | | --- | --- | --- | | 1. | PRED AND PRED | (P \land P = P) | 2. | PRED AND TRUE | (P \land TRUE = P) | 3. | PRED AND FALSE | (P \land FALSE = FALSE) | 4. | PRED AND NOT (PRED) | (P \land \lnot P = FALSE) | 5. | PRED1 AND (PRED1 OR PRED2) | (P1 \land (P1 \lor P2) = P1) | 6. | PRED OR PRED | (P \lor P = P) | 7. | PRED OR TRUE | (P \lor TRUE = TRUE) | 8. | PRED OR FALSE | (P \lor FALSE = P) | 9. | PRED OR NOT (PRED) | (P \lor \lnot P = TRUE) | 10. | PRED1 OR (PRED1 AND PRED2) | (P1 \lor (P1 \land P2) = P1) ## Query Restructuring * Reconstructs the query for more efficient implementation. * Transformation rules convert relational algebra expressions into equivalent, efficient forms. * The query is treated as a relational algebra program, consisting of operations on relations. ## Query Optimization * Chooses an efficient execution strategy to minimize resource use (I/O, CPU time). * Starts during the validation phase by checking user privileges. * Utilizes existing statistics for tables, columns, rows, indexes, and their statistics. ### Inputs to Query Optimizer 1. Relational algebra query tree. 2. Estimation formulas to determine cardinality. 3. Cost model. 4. Statistical data from the database catalog. * **Output:** Optimized relational algebra query (execution plan). ### Basic Issues * How to use available indexes. * How to use memory for information accumulation and intermediate steps (sorting). * How to determine the join order. * Aims for a reasonable approach, not always the absolute optimal. ### Techniques for Implementing Query Optimization 1. Heuristic rules. 2. Systematic estimation of the cost of different execution strategies. ## Structure of Query Evaluation Plan * Defines the algorithm for each operation and how execution should be coordinated. ## References * http://holowczak.com/database-normalization/ * Chapter 1, Database Systems Concepts, S K Singh * Chapter 1, Database System Concepts, Silberschatz, Korth, Sudarshan * Chapter 1, Database Management Systems, by Ramakrishnan and Gehr