1/41
Flashcards based on lecture notes covering database hardware, RAID, data stores, indexing, query optimization, transactions, and normalization.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Data independence
Changes can be made at one level of abstraction without impacting the others.
Mirroring
Duplicate disks used to protect against failure, requiring 2 writes to update.
Striping
Storing data across multiple disks for parallel access to speed up reads.
RAID
Reduces risk of disk failure; RAID-5 is optimal for high reads, low writes, RAID-1 is best for high writes.
Row store
Storing all data entries in one row in a block, ideal for easy updates and insertions.
Column store
Storing one column in each block, ideal for obtaining information about one attribute.
Buffer pool
A RAM cache for recently accessed data.
Log
A structure storing historical data (old/new values) used when a system fails; stores before-values for rollbacks.
Ordered index
Data sorted based on some value.
Clustered index
Sequentially ordered based on the search key (indexed value).
Unclustered index
Randomly ordered with no relation to search key. Requires seeks between each key.
B+ Tree
Leaf nodes ordered by search key (index) and chained, improving efficiency of range queries; access requires few I/Os.
Natural Join (R⋈S on R.A = S.A) - Case: one table (S) fits in RAM
Hash table with all records in S; for each record r in R, lookup r.A in table. Cost: |R| + |S|
Natural Join (R⋈S on R.A = S.A) - Case: neither table fits in RAM
Partition R and S into smaller pieces by hashing on A, matching records must be in same partition. Cost: 3|R| + 3|S|
Sort-merge join
Sort R and S on A, scan for A and merge upon match.
Block nested loops join (R⋈θS)
Compares all records, good for non-equijoins; divide outer table R into blocks, compare all records in S to each block. Cost: B * |S| + |R|
Index nested loops join (R⋈S where S has an index on A)
For all records in the outer table R, use the index on A in S to lookup r.A; ideal when indexed table is bigger than outer table
Left-deep trees
All joins occur on the left table; benefits: write output to disk, pipelining.
Query Optimization Process
Process of looking for opportunities to push selections, considering join orders, exploring left-deep plans, using dynamic programming, and comparing plans by estimating I/O cost.
Materialized views
Improves query performance for those that use view; cons: requires additional memory, must be initialized, needs maintenance.
Incremental view maintenance
Do work that’s proportional to the size of the UPDATE, not the entire table; only update what needs to be updated.
Transaction Properties
Atomicity, Consistency, Isolation, Durability.
Serial schedule
Transactions occur sequentially.
Serializable schedule
A schedule that is equivalent to when transactions occur sequentially, but may be interleaved.
Conflict equivalent schedule
When the order of conflicting operations in two schedules is the same.
Conflict serializable schedule
A conflict equivalent schedule that can be converted into a serial schedule.
Log (Transactions)
A persistent structure that stores all before/after values from transactions; important for database recovery.
Dependency graphs
Draw edge between nodes (transactions) to represent each conflict; all acyclic-directed graphs have a conflict-equivalent serializable schedule.
Locking
Shared lock requested for reads, exclusive lock requested for writes
Deadlock
Occurs when transactions are unable to be completed since they are infinitely waiting for each other’s resources; can be identified by a cycle.
2PL (Two-Phase Locking)
Characterized by gaining locks in a growing phase and releasing locks in a shrinking phase; avoids cascading aborts.
Timestamp-based Concurrency Control
Each action has a representative timestamp; goal: execute conflicting actions in timestamp order
Optimistic Concurrency Control
Assumes transactions will be able to finish execution and be validated at the end.
Recovery (ARIES) Phases
Analysis, Redo, Undo.
Update Anomaly
Causes inconsistencies across the database by updating only one instance of an attribute rather than all times it appears.
Insertion Anomaly
Missing/incorrect primary key or other identifier.
Functional dependency
A relationship between attributes that shows when one determines another.
Superkey
Attribute(s) that functionally determine all other columns.
Candidate key
Attribute(s) that can be used to identify other columns through functional dependencies.
Lossy/Lossless Decomposition
When information is lost after re-joining a decomposition; true if A ∩ B → A or A ∩ B → B.
BCNF (Boyce-Codd Normal Form)
A schema is in BCNF if for any A → B in F+ and for every table R: A → B is trivial or A is a superkey of R.
3NF (Third Normal Form)
A schema is in 3NF if for any A → B in F+ and every table R: A → B is trivial, A is a superkey of R, or each attribute of B - A is a part of some key for R.