Storage and Retrieval: Lecture Notes Flashcards

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/21

Earn XP

Description and Tags

These flashcards cover the core vocabulary and concepts of database storage and retrieval, including log-structured storage, B-trees, indexing strategies, and in-memory systems as discussed in Chapter 4.

Last updated 4:47 AM on 6/23/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

22 Terms

New cards

Log

An append-only sequence of records on disk, which can be binary or human-readable and is used as a fundamental principle in many databases.

New cards

Index

An additional structure derived from the primary data that improves the performance of read queries, though it typically slows down writes and consumes disk space.

New cards

$O(n)$ Search Cost

The algorithmic cost of a lookup in a simple append-only database without an index, where the search time doubles as the number of records $n$ doubles.

New cards

SSTable (Sorted Strings Table)

A file format for storage engines where key-value pairs are sorted by key and each key appears only once within each merged segment file.

New cards

Sparse Index

An index that stores only some of the keys (such as the first key of every block of a few kilobytes) to allow queries to jump to the correct block in a sorted file.

New cards

Memtable

An in-memory ordered data structure, such as a red–black tree or skip list, where incoming writes are stored before being flushed to disk as an SSTable.

New cards

Compaction

A background process in log-structured storage engines that merges segment files and discards overwritten or deleted values to reclaim disk space.

New cards

Tombstone

A special deletion record appended to a log-structured data file that tells the merging process to discard any previous values for a deleted key.

New cards

LSM-tree (Log-Structured Merge-tree)

A type of storage engine based on the principle of merging and compacting sorted files, originally described by Patrick O'Neil and others in 1996.

New cards

Bloom Filter

A memory-efficient probabilistic data structure used to check if a key exists in an SSTable, helping to avoid unnecessary disk I/O for non-existent keys.

New cards

Size-tiered Compaction

A compaction strategy where newer and smaller SSTables are successively merged into older and larger SSTables; it generally handles higher write throughput.

New cards

Leveled Compaction

A compaction strategy that keeps SSTable sizes fixed and groups them into levels, partitioning the key range to improve read performance and use less disk space.

New cards

Embedded Database

A database implemented as a library that runs in the same process as the application code rather than as a separate service, such as SQLite or RocksDB.

New cards

B-tree

The most widely used index structure that breaks the database into fixed-size blocks or pages (typically $4$ , $8$ , or $16 \text{ KiB}$ ) and updates data in place.

New cards

Branching Factor

The number of references to child pages contained in a single B-tree page, which determines the tree's depth for a given amount of data.

New cards

Write-Ahead Log (WAL)

An append-only file on disk to which every B-tree modification must be written before it can be applied to the tree pages to ensure crash resilience.

New cards

Torn Page

A corruption scenario where a database crash occurs while only part of a B-tree page has been written to disk.

New cards

Write Amplification

A measure of storage engine overhead calculated as the total bytes written to disk divided by the number of bytes requested by the application.

New cards

Clustered Index

An index where the actual row or document data is stored directly within the index structure rather than being stored in a separate heap file.

New cards

Heap File

A common approach for storing data where rows are kept in no particular order, and index entries store references to specific locations within this file.

New cards

Covering Index

A type of index that stores some of a table's columns in addition to the indexed key, allowing some queries to be answered using the index alone.

New cards

In-memory Database

A database that keeps its entire dataset in RAM for performance, such as Redis or VoltDB, typically using a disk log or snapshots only for durability.