In-depth Notes on Indexing and Hashing
Basic Concepts
Indexing Mechanisms: Used to speed up access to desired data (e.g., author catalog in library).
Search Key: Attribute or set of attributes used to look up records in a file.
Index File Structure: Consists of records (index entries) paired with pointers pointing to the actual records.
Size Comparison: Index files are typically much smaller than the original data file.
Types of Indices
Ordered Indices:
Search keys stored in sorted order.
Primary Index: Index that specifies the sequential order of the file. Often the primary key.
Secondary Index: Index with a search key order different from the file's sequential order.
Hash Indices:
Search keys are uniformly distributed across "buckets" using a hash function.
Efficient for lookup by hash value but requires handling of collisions (same hash value).
Index Evaluation Metrics
Access Types: Support for:
Exact queries (specific attribute values).
Range queries (e.g., values falling within a certain range).
Performance Metrics:
Access time
Insertion time
Deletion time
Space overhead
Dense vs Sparse Index Files
Dense Index Files:
Contains an index record for every search-key value.
Example: ID attribute index of instructor relation.
Sparse Index Files:
Contains index records for only some search-key values.
To find a record, locate the largest search-key value < K and search sequentially in the file starting from that record.
Insertion and Deletion in Sparse Index
Insertion: New search-key value added to the index. If a block needs to be created, the first search-key value from the new block is inserted into the index.
Deletion:
Dense Index: Record deletion similar to file record deletion.
Sparse Index: Delete an entry by replacing it with the next search-key value in search-key order.
B+-Tree Index Files
B+-Tree Structure: A multi-level index structure where every node is a disk block. Maintains order to optimize search queries.
Insertion and Deletion:
Involves maintaining balance. Can split nodes if too full upon insertions.
Searching: Efficient; searches generally take logarithmic time.
Bitmap Indices
Definition: A special type of index that uses a bitmap for efficient querying on multiple keys.
Operations: Allow for logical operations such as AND, OR, and NOT to efficiently answer queries.
SQL Index Definition
Creating an Index:
create index <index-name> on <relation-name> (<attribute-list>)
Dropping an Index:
drop index <index-name>