1/21
These flashcards cover the core vocabulary and concepts of database storage and retrieval, including log-structured storage, B-trees, indexing strategies, and in-memory systems as discussed in Chapter 4.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
Log
An append-only sequence of records on disk, which can be binary or human-readable and is used as a fundamental principle in many databases.
Index
An additional structure derived from the primary data that improves the performance of read queries, though it typically slows down writes and consumes disk space.
O(n) Search Cost
The algorithmic cost of a lookup in a simple append-only database without an index, where the search time doubles as the number of records n doubles.
SSTable (Sorted Strings Table)
A file format for storage engines where key-value pairs are sorted by key and each key appears only once within each merged segment file.
Sparse Index
An index that stores only some of the keys (such as the first key of every block of a few kilobytes) to allow queries to jump to the correct block in a sorted file.
Memtable
An in-memory ordered data structure, such as a red–black tree or skip list, where incoming writes are stored before being flushed to disk as an SSTable.
Compaction
A background process in log-structured storage engines that merges segment files and discards overwritten or deleted values to reclaim disk space.
Tombstone
A special deletion record appended to a log-structured data file that tells the merging process to discard any previous values for a deleted key.
LSM-tree (Log-Structured Merge-tree)
A type of storage engine based on the principle of merging and compacting sorted files, originally described by Patrick O'Neil and others in 1996.
Bloom Filter
A memory-efficient probabilistic data structure used to check if a key exists in an SSTable, helping to avoid unnecessary disk I/O for non-existent keys.
Size-tiered Compaction
A compaction strategy where newer and smaller SSTables are successively merged into older and larger SSTables; it generally handles higher write throughput.
Leveled Compaction
A compaction strategy that keeps SSTable sizes fixed and groups them into levels, partitioning the key range to improve read performance and use less disk space.
Embedded Database
A database implemented as a library that runs in the same process as the application code rather than as a separate service, such as SQLite or RocksDB.
B-tree
The most widely used index structure that breaks the database into fixed-size blocks or pages (typically 4, 8, or 16 KiB) and updates data in place.
Branching Factor
The number of references to child pages contained in a single B-tree page, which determines the tree's depth for a given amount of data.
Write-Ahead Log (WAL)
An append-only file on disk to which every B-tree modification must be written before it can be applied to the tree pages to ensure crash resilience.
Torn Page
A corruption scenario where a database crash occurs while only part of a B-tree page has been written to disk.
Write Amplification
A measure of storage engine overhead calculated as the total bytes written to disk divided by the number of bytes requested by the application.
Clustered Index
An index where the actual row or document data is stored directly within the index structure rather than being stored in a separate heap file.
Heap File
A common approach for storing data where rows are kept in no particular order, and index entries store references to specific locations within this file.
Covering Index
A type of index that stores some of a table's columns in addition to the indexed key, allowing some queries to be answered using the index alone.
In-memory Database
A database that keeps its entire dataset in RAM for performance, such as Redis or VoltDB, typically using a disk log or snapshots only for durability.