Chapter 11: Indexing and Hashing - Summary Notes

Basic Concepts

Indexing Mechanisms: Used to enhance data access speed.
- Example: Author catalog in a library.
Search Key: Attribute(s) used to look up records in a file.
Index File: Comprises records (index entries) in the format search-key pointer.
Index Size: Typically smaller than the original file.
Types of Indices:
- Ordered Indices: Search keys stored in sorted order.
- Hash Indices: Search keys uniformly distributed across buckets using a hash function.

Access Types Supported:
- Exact match record retrieval.
- Range of values.
Performance Metrics:
- Access time
- Insertion time
- Deletion time
- Space overhead

Storage Order: Search keys sorted based on their values.
Primary Index: Specifies the sequential order of a file.(also called clustering index). Usually linked with the primary key.
Secondary Index: Specifies an order different from the file's sequential order (also called non-clustering index).
Index-Sequential File: Combines ordered sequential file with a primary index.

Dense Index: Contains index records for every search-key value.
- Example: An index on the ID attribute of an instructor relation.

Sparse Index: Contains index records for only some search-key values.
Search Process: Locate the index record with the largest search-key value < K, then search sequentially in the file.
Benefits:
- Requires less space.
- Lower insertion/deletion overhead compared to dense indices.
- Slower for locating records.

Functionality: Points to buckets with pointers to actual records (e.g., secondary index on salary of an instructor).
Consideration: Updating indices causes overhead on database modifications due to necessary updates on each index upon file changes.

Need for Multilevel Index: When the primary index does not fit into memory.
Structure: Outer sparse index of primary index, inner primary index.
Performance: Indices at all levels must be updated on file insertions or deletions.

Dense Indices: Similar to file record deletion.
Sparse Indices: Replace entry with the next search-key value if it exists; remove the search-key from the index if it's the last occurrence.

Dense Indices: Insert if not already present.
Sparse Indices: Only update if a new block is created and record values need to be added.

Performance and Maintenance: Optimally reorganizes itself on insertions and deletions, eliminating the need for periodic reorganization.
Tree Properties: All paths from root to leaf are of equal length, with specified child and value limits for internal and leaf nodes.
- Example: Leaf nodes must contain a selective range of values based on defined rules.

Process: Navigate through nodes based on the search-key until locating the corresponding record or confirmation of its absence.

Duplicates: Allow same search-key value at multiple indices.
Search Adjustment: Traverse possible pointers to find all instances of the key rather than terminating at first find.

Static Hashing: Uses a fixed number of buckets. Prone to overflow as the database grows. Requires reorganization if needed.
Dynamic Hashing: Can adjust the hash function and number of buckets based on the database size, maintaining efficiency. Example: Extendable hashing allows dynamically expanded hash address tables.

Best Use Cases: Efficient for querying multiple keys with low distinct attribute values, e.g., gender, income levels.
Operation Types: Bitmap indices allow for efficient operations like intersection, union, and complementation for query processing.