1/4
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
single level index
A single-level index is a file containing column values, along with pointers to rows containing the column value. The pointer identifies the block containing the row. In some indexes, the pointer also identifies the exact location of the row within the block. Indexes are created by database designers with the CREATE INDEX command, described elsewhere in this material.
Single-level indexes are normally sorted on the column value. A sorted index is not the same as an index on a sorted table. Ex: An index on a heap table is a sorted index on an unsorted table.
If an indexed column is unique, the index has one entry for each column value. If an indexed column is not unique, the index may have multiple entries for some column values, or one entry for each column value, followed by multiple pointers.
An index is usually defined on a single column, but an index can be defined on multiple columns. In a multi-column index, each index entry is a composite of values from all indexed columns. In all other respects, multi-column indexes behave exactly like indexes on a single column.
query processing
To execute a SELECT query, the database can perform a table scan or an index scan:
A table scan is a database operation that reads table blocks directly, without accessing an index.
An index scan is a database operation that reads index blocks sequentially, in order to locate the needed table blocks.
Hit ratio, also calledfilter factororselectivity, is the percentage of table rows selected by a query. When a SELECT query is executed, the database examines the WHERE clause and estimates hit ratio. If hit ratio is high, the database performs a table scan. If hit ratio is low, the query needs only a few table blocks, so a table scan would be inefficient. Instead, the database:
Looks for an indexed column in the WHERE clause.
Scans the index.
Finds values that match the WHERE clause.
Reads the corresponding table blocks.
If the WHERE clause does not contain an indexed column, the database must perform a table scan.
Since a column value and pointer occupy less space than an entire row, an index requires fewer blocks than a table. Consequently, index scans are much faster than table scans. In some cases, indexes are small enough to reside in main memory, and index scan time is insignificant. When hit ratio is low, additional time to read the table blocks containing selected rows is insignificant.
binary search
When hit ratio is low, index scans are always faster than table scans. Consider the following scenario:
A table has 10 million rows.
Each row is 100 bytes.
Each block is 4 kilobytes.
Each index entry is 10 bytes, including a 6-byte value and a 4-byte pointer.
Magnetic disk transfer rate is 0.1 gigabytes per second.
If hit ratio is low, an index scan is roughly 10 times faster than a table scan:
Table scan. The table contains 1 billion bytes (10 million rows Ă— 100 bytes/row). The table scan takes around 10 seconds (1 gigabyte / 0.1 gigabytes/sec).
Index scan. The index contains 100 million bytes (10 million rows Ă— 1 entry/row Ă— 10 bytes/entry). The index scan takes around 1 second (0.1 gigabyte / 0.1 gigabytes/sec). The index scan returns pointers to blocks containing rows selected by the query. When hit ratio is low, additional time to read table blocks is insignificant.
Although index scans are faster than table scans, index scans are too slow in many cases. If a single-level index is sorted, each value can be located with a binary search. In a binary search, the database repeatedly splits the index in two until it finds the entry containing the search value:
The database first compares the search value to an entry in the middle of the index.
If the search value is less than the entry value, the search value is in the first half of the index. If not, the search value is in the second half.
The database now compares the search value to the entry in the middle of the selected half, to narrow the search to one quarter of the index.
The database continues in this manner until it finds the index block containing the search value.
For an index with N blocks, a binary search reads log2N blocks, rounded up to the nearest integer. In the example above, the index has 25,000 blocks (10,000,000 rows Ă— 10 bytes/index entry / 4,000 bytes/ block) . The binary search reads at most log225,000, rounded up, or 15 blocks. This search takes about 0.0006 seconds (15 blocks Ă— 4 kilobytes/block / 0.1 gigabytes/sec).
primary and secondary indexes
Indexes on a sorted table may be primary or secondary:
A primary index, also called a clustering index, is an index on a sort column.
A secondary index, also called a nonclustering index, is an index that is not on the sort column.
A sorted table can have only one sort column, and therefore only one primary index. Usually, the primary index is on the primary key column(s). In some database systems, the primary index may be created on any column. Tables can have many secondary indexes. All indexes of a heap or hash table are secondary, since heap and hash tables have no sort column.
Indexes may also be dense or sparse:
A dense index contains an entry for every table row.
A sparse index contains an entry for every table block.
When a table is sorted on an index column, the index may be sparse, as illustrated in the animation below. Primary indexes are on sort columns and usually sparse. Secondary indexes are on non-sort columns and therefore are always dense.
Sparse indexes are much faster than dense indexes since sparse indexes have fewer entries and occupy fewer blocks. Consider the following scenario:
A table has 10 million rows.
Each row is 100 bytes.
Table and index blocks are 4 kilobytes.
Each index entry is 10 bytes.
The table occupies 250,000 blocks (10 million rows Ă— 100 bytes/row / 4 kilobytes/block). A sparse index requires 250,000 entries (one entry per table block) and occupies 625 blocks (250,000 entries Ă— 10 bytes/entry / 4 kilobytes/block). The sparse index can easily be retained in main memory.
Primary indexes are usually sparse and sparse indexes are fast. As a result, database designers usually create a primary index on the primary key of large tables.
inserts, updates, and deletes
Inserts, updates, and deletes to tables have an impact on single-level indexes. Consider the behavior of dense indexes:
Insert. When a row is inserted into a table, a new index entry is created. Since single-level indexes are sorted, the new entry must be placed in the correct index block. If this block is full, the block splits in two to create space for the new index entry, as in the animation below.
Delete. When a row is deleted, the row's index entry must be deleted. The deleted entry can be either physically removed or marked as 'deleted'. Since single-level indexes are sorted, physically removing an entry requires moving all subsequent entries, which is slow. For this reason, index entries are marked as 'deleted'. Periodically, the database may reorganize the index to remove deleted entries and compress the index.
Update. An update to a column that is not indexed does not affect the index. An update to an indexed column is like a delete followed by an insert. The index entry for the initial value is deleted and an index entry for the updated value is inserted.
With a sparse index, each entry corresponds to a table block rather than a table row. Index entries are inserted or deleted when blocks split or merge. Since blocks contain many rows, block splits and mergers occur less often than row inserts and deletes. Aside from frequency, however, the behavior of sparse and dense indexes is similar.