In-depth Notes on Indexing and Hashing

Basic Concepts

  • Indexing Mechanisms: Used to speed up access to desired data (e.g., author catalog in library).

  • Search Key: Attribute or set of attributes used to look up records in a file.

  • Index File Structure: Consists of records (index entries) paired with pointers pointing to the actual records.

  • Size Comparison: Index files are typically much smaller than the original data file.

Types of Indices

  • Ordered Indices:

    • Search keys stored in sorted order.

    • Primary Index: Index that specifies the sequential order of the file. Often the primary key.

    • Secondary Index: Index with a search key order different from the file's sequential order.

  • Hash Indices:

    • Search keys are uniformly distributed across "buckets" using a hash function.

    • Efficient for lookup by hash value but requires handling of collisions (same hash value).

Index Evaluation Metrics

  • Access Types: Support for:

    • Exact queries (specific attribute values).

    • Range queries (e.g., values falling within a certain range).

  • Performance Metrics:

    • Access time

    • Insertion time

    • Deletion time

    • Space overhead

Dense vs Sparse Index Files

  • Dense Index Files:

    • Contains an index record for every search-key value.

    • Example: ID attribute index of instructor relation.

  • Sparse Index Files:

    • Contains index records for only some search-key values.

    • To find a record, locate the largest search-key value < K and search sequentially in the file starting from that record.

Insertion and Deletion in Sparse Index

  • Insertion: New search-key value added to the index. If a block needs to be created, the first search-key value from the new block is inserted into the index.

  • Deletion:

    • Dense Index: Record deletion similar to file record deletion.

    • Sparse Index: Delete an entry by replacing it with the next search-key value in search-key order.

B+-Tree Index Files

  • B+-Tree Structure: A multi-level index structure where every node is a disk block. Maintains order to optimize search queries.

  • Insertion and Deletion:

    • Involves maintaining balance. Can split nodes if too full upon insertions.

  • Searching: Efficient; searches generally take logarithmic time.

Bitmap Indices

  • Definition: A special type of index that uses a bitmap for efficient querying on multiple keys.

  • Operations: Allow for logical operations such as AND, OR, and NOT to efficiently answer queries.

SQL Index Definition
  • Creating an Index:

    • create index <index-name> on <relation-name> (<attribute-list>)

  • Dropping an Index:

    • drop index <index-name>