Compiler Optimization Techniques: Dominators, Dead Code Elimination, Dataflow Analysis, SSA
Dominators
Dominator Concept
- Definition: A node D is said to dominate node N if every path from the entry node to N must go through D.
- Notation: D dominates N can be noted as: D \prec N.
- Immediate Dominator (IDOM):
- Definition: IDOM(N) is the closest dominator of N that is not N itself.
- Example:
- If the dominators are given as:
- A B C D E F G H
- Then the immediate dominators would be: Ø A B B D D D B C.
Dominator Tree
- A representation of the relationship between nodes in terms of their dominators.
- Structure: Each node points to its immediate dominator, forming a tree structure.
Dominance Frontier (DF)
- Definition: The dominance frontier of a node D is the set of nodes that are not dominated by D, but are first reachable from D along some control flow path.
- Significance: Indicates where the guarantee of dominance stops.
Post-Dominance
- Node J post-dominates node I if every path that exits I must go through J.
- Post-Dominance Frontier (PDF): Defines where post-dominance conditions apply.
- Control Dependence: Node J is control dependent on node I if I EPDF(J).
Dead Code Elimination
Definition: The process of removing code that does not affect the program's outcome.
Types of Dead Code
- Useless (dead): Unused operations within the code.
- Unreachable: Parts of the code with no valid control flow path leading to them.
Identifying Useful Operations
- Mark-Sweep Algorithm:
- Used for identifying useful operations by considering branches in the flow of control.
- Mark Phase:
- For each operation op, if op is critical, it is marked as useful.
- Add op to a worklist.
- While the worklist is not empty:
- Remove op from worklist.
- If Def(y) is not marked, mark Def(y) as useful and add it to the worklist.
- If Def(z) is not marked, mark Def(z) as useful and add it to the worklist.
- For each block b in PDF(op), mark the ending branch in b and add to the worklist.
- Sweep Phase:
- For each operation op, if op is not marked and is a branch:
- Replace op with a jump to its nearest useful post-dominator.
- If not a branch, delete op.
Critical Operations:
- Usually include store operations, jumps, branches.
Mark-Sweep Algorithm Simplified:
- For simplifying operations within the Mark-Sweep process with example operations demonstrated.
Dataflow Analysis
Purpose: To gather information about the flow of data within programs.
Key Concepts:
- Availability: To check if an expression has been computed already.
- Liveliness: To determine if a variable will be needed in the future.
- Reach Definition: Find out which definitions will reach the point of usage without being overwritten.
Data Flow Properties:
- IN, OUT sets, GEN/Kill are essential structures in Dataflow Analysis.
- Changes in direction based on Forward vs Backward analysis.
Forward Analysis
- Entry-Exit paths are analyzed.
- "May" analysis utilized for uninitialized data.
- Union (U) used for combining sets of variables.
Backward Analysis
- Exit-Entry paths evaluated.
- "Must" analysis is conducted with intersection (∩).
Reach Definition Practice:
- Define maintainable states to evaluate definitions and uses effectively.
- Collecting definitions that occur within a specific point in the code (using examples for clarity).
SSA (Static Single Assignment) and Phi Functions
Simple SSA:
- A transformation in which each variable is assigned exactly once, with the use of phi functions at control flow join points.
Dominance-Based SSA:
- For a variable defined in block B, place the definition in blocks that are the dominance frontiers of B.
- Example shown illustrating the reach definition analysis of a variable across control flow branches.
Example of SSA Transformation:
- Walking through detailed step-by-step assignment and transformations of variables across blocks.