Chapter 11: Indexing and Hashing - Summary Notes

Basic Concepts

  • Indexing Mechanisms: Used to enhance data access speed.
    • Example: Author catalog in a library.
  • Search Key: Attribute(s) used to look up records in a file.
  • Index File: Comprises records (index entries) in the format search-key pointer.
  • Index Size: Typically smaller than the original file.
  • Types of Indices:
    • Ordered Indices: Search keys stored in sorted order.
    • Hash Indices: Search keys uniformly distributed across buckets using a hash function.

Index Evaluation Metrics

  • Access Types Supported:
    • Exact match record retrieval.
    • Range of values.
  • Performance Metrics:
    • Access time
    • Insertion time
    • Deletion time
    • Space overhead

Ordered Indices

  • Storage Order: Search keys sorted based on their values.
  • Primary Index: Specifies the sequential order of a file.(also called clustering index). Usually linked with the primary key.
  • Secondary Index: Specifies an order different from the file's sequential order (also called non-clustering index).
  • Index-Sequential File: Combines ordered sequential file with a primary index.

Dense Index Files

  • Dense Index: Contains index records for every search-key value.
    • Example: An index on the ID attribute of an instructor relation.

Sparse Index Files

  • Sparse Index: Contains index records for only some search-key values.
  • Search Process: Locate the index record with the largest search-key value < K, then search sequentially in the file.
  • Benefits:
    • Requires less space.
    • Lower insertion/deletion overhead compared to dense indices.
    • Slower for locating records.

Secondary Indices

  • Functionality: Points to buckets with pointers to actual records (e.g., secondary index on salary of an instructor).
  • Consideration: Updating indices causes overhead on database modifications due to necessary updates on each index upon file changes.

Multilevel Index

  • Need for Multilevel Index: When the primary index does not fit into memory.
  • Structure: Outer sparse index of primary index, inner primary index.
  • Performance: Indices at all levels must be updated on file insertions or deletions.

Index Update: Deletion

  • Dense Indices: Similar to file record deletion.
  • Sparse Indices: Replace entry with the next search-key value if it exists; remove the search-key from the index if it's the last occurrence.

Index Update: Insertion

  • Dense Indices: Insert if not already present.
  • Sparse Indices: Only update if a new block is created and record values need to be added.

B+-Tree Index Files

  • Performance and Maintenance: Optimally reorganizes itself on insertions and deletions, eliminating the need for periodic reorganization.
  • Tree Properties: All paths from root to leaf are of equal length, with specified child and value limits for internal and leaf nodes.
    • Example: Leaf nodes must contain a selective range of values based on defined rules.

Queries on B+-Trees

  • Process: Navigate through nodes based on the search-key until locating the corresponding record or confirmation of its absence.

Handling Duplicates in B+-Trees

  • Duplicates: Allow same search-key value at multiple indices.
  • Search Adjustment: Traverse possible pointers to find all instances of the key rather than terminating at first find.

Hashing: Static and Dynamic Hashing

  • Static Hashing: Uses a fixed number of buckets. Prone to overflow as the database grows. Requires reorganization if needed.
  • Dynamic Hashing: Can adjust the hash function and number of buckets based on the database size, maintaining efficiency. Example: Extendable hashing allows dynamically expanded hash address tables.

Bitmap Indices

  • Best Use Cases: Efficient for querying multiple keys with low distinct attribute values, e.g., gender, income levels.
  • Operation Types: Bitmap indices allow for efficient operations like intersection, union, and complementation for query processing.

SQL Index Definition

  • SQL syntax: CREATE INDEX <index-name> ON <relation-name> (<attribute-list>).
  • Unique Indexes: Enforce unique constraints on attributes.