Chapter 17: Indexing Structures Study Notes
Chapter 17: Indexing Structures
Overview
This chapter discusses various types of indexing structures used in database systems, particularly focusing on single-level and multi-level indexes, as well as dynamic structures using B-Trees and B+-Trees.
Outline
Types of Single-Level Ordered Indexes
Primary Indexes
Clustering Indexes
Secondary Indexes
Multi-Level Indexes
Dynamic Multi-Level Indexes Using B-Trees and B+-Trees
Indexes as Access Paths
A single-level index is an auxiliary file designed to improve the efficiency of searching for a record within a data file.
In databases, the index is typically stored in the same file as the data.
The index can be specified on one field of the file; however, it can also extend to multiple fields.
The index is structured as a file of entries which consist of:
Field value
Pointer to the corresponding record
The structure is generally ordered by the field value, creating efficient access paths.
Characteristics of Indexes
The index serves as an access path for efficient data retrieval.
The size of the index file is usually significantly smaller than the size of the data file due to smaller entry sizes.
A binary search on the index typically yields a pointer to the corresponding file record.
Indexes can be classified as either dense or sparse:
Dense Index: Contains an index entry for every search key value, thus for every record in the data file.
Sparse (Nondense) Index: Contains index entries for only a subset of search values.
Types of Single-Level Ordered Indexes
Primary Index
Definition: A primary index is defined on an ordered data file, where the data file is systematically organized based on a key field.
Each index entry represents a block in the data file:
The entry includes the key field value of the first record in that block, referred to as the block anchor.
An alternative is to anchor on the last record in a block.
A primary index is categorized as a nondense (sparse) index since it includes one entry for each disk block of the data file rather than for every search value.
Example: In a primary index on the first field of a file containing records like names, social security numbers (ssn), birth dates, jobs, salaries, and sexes, the structure would illustrate how block pointers direct to records.
Clustering Index
Definition: A clustering index is defined on an ordered data file where the organization does not depend solely on a key field with distinct values.
Each index entry corresponds to a distinct value of the field:
The entry points to the first data block that contains records sharing that field value.
The clustering index is another form of a nondense index;
Insertion and deletion operations are relatively straightforward due to this structure.
Challenges: Clusters may appear in the middle of blocks, span multiple blocks, and require careful planning to avoid empty space issues.
Secondary Index
Definition: A secondary index serves as an alternative mechanism for accessing a file that already has a primary access method.
Can be defined on:
A candidate key (unique value across records).
A non-key with duplicate values.
Structure:
The secondary index is an ordered file containing two fields:
The first field corresponds to an indexing field, matching the data type of a non-ordering field in the data file.
The second may be either a block pointer or record pointer.
Notable Characteristics:
Inclusion of one entry for each data file record, indicating that it is a dense index.
Properties of Index Types
Type of Index | Number of Index Entries | Dense or Nondense | Block Anchoring on Data File |
|---|---|---|---|
Primary | Nondense | Yes | Yes |
Clustering | Nondense | No | |
Secondary (key) | Dense | Yes | |
Secondary (non-key) | Dense or Nondense | No |
Multi-Level Indexes
A multi-level index allows for a primary index to refer to another index file.
The original index file is recognized as the first-level index and the new index created is termed the second-level index.
This construction can be repeated to form additional levels until all entries at the top level fit within a single disk block.
Multi-level indexes can be generated for any first-level index type as long as it consists of more than one block.
Example of Two-Level Primary Index
Insertion techniques involve the use of an overflow file, which merges periodically with the main data file.
The index is restructured during file reorganization processes.
Challenges: Scalability and flexibility in maintaining the index structure amid ongoing insertions and deletions.
Dynamic Multilevel Indexes Using B-Trees and B+-Trees
B-trees and B+-trees are the preferred structures for multi-level indexes due to their efficiency in managing insertion and deletion operations.
Each node in these trees corresponds to a disk block, maintaining a capacity that is between half-full and completely full:
Insertion into an adequately filled node is straightforward.
If a node is completely full, it splits into two nodes potentially affecting the higher levels of the tree.
Deletion is efficient until a node becomes less than half-full, prompting a merge or redistribution with neighboring nodes.
Differences Between B-Tree and B+-Tree
In a B-tree, pointers to data records exist at all levels of the tree.
In a B+-tree, all pointers to data records are maintained at the leaf-level nodes, allowing for a flatter structure that can result in fewer levels compared to B-trees.
B+
The main advantage includes efficiency in key-sequential reads (iteration) which can traverse across the bottom level of the tree, enhancing performance in certain access patterns.
Examples of Insertion and Deletion in B+-Trees
Insertion Examples
Insert values following a sequence (e.g., 8, 5, 1, 7, 3, 12, 9, 6).
Each operation checks for overflow, leading to splits or new levels introduced in the tree structure.
Deletion Example
Deletions occur in a specified sequence, checking for underflows and managing merges when block capacities fall below the required threshold.
Summary
The chapter reviews various indexing structures within databases, including:
Types of single-level ordered indexes (primary, clustering, secondary)
Multi-level indexes
Dynamic structures with B-trees and B+-trees which address challenges in scalability and efficiency.