Chapter 17: Indexing Structures Study Notes

This chapter discusses various types of indexing structures used in database systems, particularly focusing on single-level and multi-level indexes, as well as dynamic structures using B-Trees and B+-Trees.

Types of Single-Level Ordered Indexes
- Primary Indexes
- Clustering Indexes
- Secondary Indexes
Multi-Level Indexes
Dynamic Multi-Level Indexes Using B-Trees and B+-Trees

A single-level index is an auxiliary file designed to improve the efficiency of searching for a record within a data file.
In databases, the index is typically stored in the same file as the data.
The index can be specified on one field of the file; however, it can also extend to multiple fields.
The index is structured as a file of entries which consist of:
- Field value
- Pointer to the corresponding record
The structure is generally ordered by the field value, creating efficient access paths.

The index serves as an access path for efficient data retrieval.
The size of the index file is usually significantly smaller than the size of the data file due to smaller entry sizes.
A binary search on the index typically yields a pointer to the corresponding file record.
Indexes can be classified as either dense or sparse:
- Dense Index: Contains an index entry for every search key value, thus for every record in the data file.
- Sparse (Nondense) Index: Contains index entries for only a subset of search values.

Definition: A primary index is defined on an ordered data file, where the data file is systematically organized based on a key field.
Each index entry represents a block in the data file:
- The entry includes the key field value of the first record in that block, referred to as the block anchor.
- An alternative is to anchor on the last record in a block.
A primary index is categorized as a nondense (sparse) index since it includes one entry for each disk block of the data file rather than for every search value.
Example: In a primary index on the first field of a file containing records like names, social security numbers (ssn), birth dates, jobs, salaries, and sexes, the structure would illustrate how block pointers direct to records.

Definition: A clustering index is defined on an ordered data file where the organization does not depend solely on a key field with distinct values.
Each index entry corresponds to a distinct value of the field:
- The entry points to the first data block that contains records sharing that field value.
The clustering index is another form of a nondense index;
- Insertion and deletion operations are relatively straightforward due to this structure.
Challenges: Clusters may appear in the middle of blocks, span multiple blocks, and require careful planning to avoid empty space issues.

Definition: A secondary index serves as an alternative mechanism for accessing a file that already has a primary access method.
Can be defined on:
- A candidate key (unique value across records).
- A non-key with duplicate values.
Structure:
- The secondary index is an ordered file containing two fields:
- The first field corresponds to an indexing field, matching the data type of a non-ordering field in the data file.
- The second may be either a block pointer or record pointer.
Notable Characteristics:
- Inclusion of one entry for each data file record, indicating that it is a dense index.

Type of Index	Number of Index Entries	Dense or Nondense	Block Anchoring on Data File
Primary	Nondense	Yes	Yes
Clustering	Nondense	No
Secondary (key)	Dense	Yes
Secondary (non-key)	Dense or Nondense	No

A multi-level index allows for a primary index to refer to another index file.
The original index file is recognized as the first-level index and the new index created is termed the second-level index.
This construction can be repeated to form additional levels until all entries at the top level fit within a single disk block.
Multi-level indexes can be generated for any first-level index type as long as it consists of more than one block.

Insertion techniques involve the use of an overflow file, which merges periodically with the main data file.
The index is restructured during file reorganization processes.
Challenges: Scalability and flexibility in maintaining the index structure amid ongoing insertions and deletions.

B-trees and B+-trees are the preferred structures for multi-level indexes due to their efficiency in managing insertion and deletion operations.
Each node in these trees corresponds to a disk block, maintaining a capacity that is between half-full and completely full:
- Insertion into an adequately filled node is straightforward.
- If a node is completely full, it splits into two nodes potentially affecting the higher levels of the tree.
- Deletion is efficient until a node becomes less than half-full, prompting a merge or redistribution with neighboring nodes.

In a B-tree, pointers to data records exist at all levels of the tree.
In a B+-tree, all pointers to data records are maintained at the leaf-level nodes, allowing for a flatter structure that can result in fewer levels compared to B-trees.

The main advantage includes efficiency in key-sequential reads (iteration) which can traverse across the bottom level of the tree, enhancing performance in certain access patterns.

Insert values following a sequence (e.g., 8, 5, 1, 7, 3, 12, 9, 6).
Each operation checks for overflow, leading to splits or new levels introduced in the tree structure.

Deletions occur in a specified sequence, checking for underflows and managing merges when block capacities fall below the required threshold.

The chapter reviews various indexing structures within databases, including:
- Types of single-level ordered indexes (primary, clustering, secondary)
- Multi-level indexes
- Dynamic structures with B-trees and B+-trees which address challenges in scalability and efficiency.