Chapter 11: Indexing and Hashing - Summary Notes
Basic Concepts
- Indexing Mechanisms: Used to enhance data access speed.
- Example: Author catalog in a library.
- Search Key: Attribute(s) used to look up records in a file.
- Index File: Comprises records (index entries) in the format
search-key pointer. - Index Size: Typically smaller than the original file.
- Types of Indices:
- Ordered Indices: Search keys stored in sorted order.
- Hash Indices: Search keys uniformly distributed across buckets using a hash function.
Index Evaluation Metrics
- Access Types Supported:
- Exact match record retrieval.
- Range of values.
- Performance Metrics:
- Access time
- Insertion time
- Deletion time
- Space overhead
Ordered Indices
- Storage Order: Search keys sorted based on their values.
- Primary Index: Specifies the sequential order of a file.(also called clustering index). Usually linked with the primary key.
- Secondary Index: Specifies an order different from the file's sequential order (also called non-clustering index).
- Index-Sequential File: Combines ordered sequential file with a primary index.
Dense Index Files
- Dense Index: Contains index records for every search-key value.
- Example: An index on the ID attribute of an instructor relation.
Sparse Index Files
- Sparse Index: Contains index records for only some search-key values.
- Search Process: Locate the index record with the largest search-key value < K, then search sequentially in the file.
- Benefits:
- Requires less space.
- Lower insertion/deletion overhead compared to dense indices.
- Slower for locating records.
Secondary Indices
- Functionality: Points to buckets with pointers to actual records (e.g., secondary index on salary of an instructor).
- Consideration: Updating indices causes overhead on database modifications due to necessary updates on each index upon file changes.
Multilevel Index
- Need for Multilevel Index: When the primary index does not fit into memory.
- Structure: Outer sparse index of primary index, inner primary index.
- Performance: Indices at all levels must be updated on file insertions or deletions.
Index Update: Deletion
- Dense Indices: Similar to file record deletion.
- Sparse Indices: Replace entry with the next search-key value if it exists; remove the search-key from the index if it's the last occurrence.
Index Update: Insertion
- Dense Indices: Insert if not already present.
- Sparse Indices: Only update if a new block is created and record values need to be added.
B+-Tree Index Files
- Performance and Maintenance: Optimally reorganizes itself on insertions and deletions, eliminating the need for periodic reorganization.
- Tree Properties: All paths from root to leaf are of equal length, with specified child and value limits for internal and leaf nodes.
- Example: Leaf nodes must contain a selective range of values based on defined rules.
Queries on B+-Trees
- Process: Navigate through nodes based on the search-key until locating the corresponding record or confirmation of its absence.
Handling Duplicates in B+-Trees
- Duplicates: Allow same search-key value at multiple indices.
- Search Adjustment: Traverse possible pointers to find all instances of the key rather than terminating at first find.
Hashing: Static and Dynamic Hashing
- Static Hashing: Uses a fixed number of buckets. Prone to overflow as the database grows. Requires reorganization if needed.
- Dynamic Hashing: Can adjust the hash function and number of buckets based on the database size, maintaining efficiency. Example: Extendable hashing allows dynamically expanded hash address tables.
Bitmap Indices
- Best Use Cases: Efficient for querying multiple keys with low distinct attribute values, e.g., gender, income levels.
- Operation Types: Bitmap indices allow for efficient operations like intersection, union, and complementation for query processing.
SQL Index Definition
- SQL syntax:
CREATE INDEX <index-name> ON <relation-name> (<attribute-list>). - Unique Indexes: Enforce unique constraints on attributes.