1/5
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
multi-level index
A multi-level index stores column values and row pointers in a hierarchy. The bottom level of the hierarchy is a sorted single-level index. The bottom level is sparse for primary indexes, or dense for secondary indexes.
Each level above the bottom is a sparse sorted index to the level below. Since all levels above the bottom are sparse, levels rapidly become smaller. The top level always fits in one block.
To locate a row containing an indexed value, the database first reads the top-level block. The database compares the indexed value to entries in the block and locates the next level block containing the value. Continuing in this manner, the database eventually locates the bottom-level block containing the value. The bottom-level block contains a pointer to the correct table block.
number of levels (sparse/dense indexes)
A dense index has more bottom-level entries than a sparse index, and may have more levels. Assuming a table with 10 million rows and 400 index entries per block, a dense index has three levels:
Level 3 is dense and has 25,000 blocks = 10 million rows / 400 index entries per block
Level 2 is sparse and has 63 blocks = 25,000 level 3 blocks / 400 index entries per block
Level 1 is sparse and has one block containing 63 index entries.
The number of index entries per block is called thefan-outof a multi-level index. The number of levels in a multi-level index can be computed from fan-out, number of rows, and rows per block:
For a dense index, number of levels = logfan-out (number of rows)
For a sparse index, number of levels = logfan-out (number of rows / rows per block)
In both cases, log is a fractional number and must be rounded up to the nearest integer. Both formulas assume minimal free space in the index.
Dense indexes usually have four levels or less. Sparse indexes usually have three levels or less.
query processing (indexes)
Multi-level indexes are faster than single-level indexes on most queries. Consider the following scenario:
A table has 10 million rows.
Each row is 100 bytes.
Each index entry is 10 bytes.
Table and index blocks are 4 kilobytes.
The table contains 250,000 blocks = 10 million rows Ă— 100 bytes per row / 4,000 bytes per block.
A dense, single-level index contains 25,000 blocks = 10 million entries Ă— 10 bytes per entry / 4,000 bytes per block.
A dense, multi-level index contains 3 levels = log 400 entries per index block 10,000,000 rows, rounded up.
A query searches for rows containing a specific value of an indexed column. Assuming query hit ratio is low:
A table scan reads at most 250,000 table blocks.
A single-level index scan reads at most 25,000 index blocks plus a few table blocks.
A binary search of a sorted, single-level index reads at most 15 index blocks (= log2 25,000, rounded up) plus a few table blocks.
A multi-level index search reads 3 index blocks plus a few table blocks.
The multi-level index search reads one index block per level. Usually the top two levels are small and retained in memory. Since the index has three levels, the query reads just one index block from storage media.
Because multi-level indexes are faster than single-level indexes on most queries, databases commonly use multi-level rather than single-level indexes.
balanced indexes
Each path from the top-level block to a bottom-level block is called a branch. Multi-level indexes are called balanced when all branches are the same length and imbalanced when branches are different lengths.
Imbalanced indexes are undesirable, since processing time is unpredictable. If a query follows a long branch, the query is relatively slow. For this reason, inserts are managed to maintain balanced indexes:
In a dense index, inserts always generate new bottom-level index entries. In a sparse index, inserts generate new bottom-level index entries when table blocks split.
If the new index entry goes in a full index block, the block splits. Half of the rows move to the new block, creating space for the entry.
The new block in the bottom level generates a new index entry the next level up. If the block in the next level up is full, the block splits and the process repeats.
If blocks are full at all index levels, the split propagates to the top level. In this case, the top-level block splits and a new level is created.
New levels are always added at the top of the hierarchy rather than the bottom of one branch. As a result, all branches are always the same length, and the index is always balanced.
Deletes may cause block mergers. Block mergers are the reverse of block splits and potentially eliminate the top level of the index. Consequently, deletes also maintain a balanced index.
Updates to an indexed column behave like a delete of the initial value followed by an insert of a new value. Since updates are implemented as deletes and inserts, updates also leave the index balanced.
B-tree and B+tree indexes
The balanced multi-level index described above is called a B+tree. B+tree structure is derived from an earlier approach called a B-tree. The two differ as follows:
B+tree. All indexed values appear in the bottom level. Pointers to table blocks appear only in the bottom level. Since some indexed values also appear in higher levels, values are occasionally repeated in the index.
B-tree. If an indexed value appears in a higher level, the value is not repeated at lower levels. Instead, a pointer to the corresponding table block appears in the higher level along with the value.
B-trees are more compact than B+trees since index values are not repeated. However, B+trees are simpler, since all pointers to table blocks appear in the same (bottom) level. The B+tree structure has two benefits:
The bottom level of a B+tree is a single-level index and can be scanned or searched.
In a B-tree, inserts, updates, and deletes may cause a table pointer to change levels, which is hard to implement. B+trees do not have this problem, since table pointers are always in the bottom level.
Because of the advantages above, multi-level indexes are usually implemented as B+trees.