1/110
Flashcards for chapter 5 of WGU's Data Management Foundations course.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
What are four dimensions on which computer media can vary?
Speed
Cost
Capacity
Volatility
access time
the time required to access the first byte in a read or write operation
transfer rate
the speed at which data is read or written, following initial access
volatile memory
memory that is lost when disconnected from power
non-volatile memory
memory that is retained without power
main memory/RAM
the primary memory used when computer programs execute, which is fast, expensive, and has limited capacity
flash memory/SSD
a type of memory stored in pages of 2 - 16 kb. Writes are slower than reads.
magnetic disk/HDD
a type of memory stored in sectors on a track, traditionally 512 bytes, but more often 4 kb on newer formats
block
a uniformly sized unit used by databases and file systems when transferring data between main memory and storage media, which is usually anywhere between 2 and 64 kb
storage controllers
a portion of a storage system internal to the storage device that converts data between blocks and sectors or pages
Smaller block sizes are better for __, which access a few rows per query.
transactional applications
Larger block sizes are better for __, which access many rows per query.
analytical applications
row-oriented storage
a form of storage that stores an entire row in one block to minimize block transfers during transactional applications
Row-oriented storage performs best when row size is small relative to __.
block size
Why does row-oriented storage perform best with rows that are small in relation to the block size?
Improved query performance due to more rows per block.
Less wasted storage due to rows fitting more evenly into available space.
For tables with large columns, each row usually contains a link to the large column, which is stored __.
in files managed by the operating system or in a special storage area managed by the database.
How does column-oriented storage benefit analytic applications?
Faster access to an entire column.
Better data compression when all values have the same data type.
PostgreSql and Vertica
examples of relational databases that support column-oriented storage
heap table
a table structure where no order is imposed on rows and instead the database maintains a list of blocks assigned to the table, along with the address of the first available space for inserts
What type of table structure is best for optimizing insert operations and bulk loading rows?
heap table
Heap tables are not optimal for __.
queries that read rows in a specific order
sorted table
a table structure where the database designer identifies a sort column, usually the primary key, that determines physical row order on storage media
Sorted tables are optimal for queries that __.
read data in order of the sort column
How does a sorted table structure handle an attempt to insert a row into a full block?
The block splits in two and half the rows are moved to a new block.
hash table
a table structure where rows are assigned to buckets
How is the bucket containing each row in a hash table determined?
By a hash function and a hash key.
hash key
a column or group of columns used to determine the bucket that each row of a table is stored in
hash function
computes the bucket containing the row from the hash key and is designed to scramble row locations and evenly distribute rows across blocksm
What are the four steps of the modulo function?
Convert the hash key by interpreting the key’s bits as an integer value.
Divide the integer by the number of buckets.
Interpret the division remainder as the bucket number.
Convert the bucket number to the physical address of the block containing the row.
What is the disadvantage of a fixed hash function in a database that is planned to grow over time?
A fixed hash function simply allocates more rows to each bucket as tables grow, resulting in deep buckets with many blocks that a query may have to read in order to access a single row.
dynamic hash function
automatically allocates more blocks to a growing table, creates additional buckets, and distributes rows across all buckets to avoid deep buckets
Hash tables are optimal for __.
inserts and deletes of individual rows
Hash tables are slow on queries that select __.
many rows with a range of values
table clusters/multi-tables
tables that interleave rows of two or more tables in the same storage area using a cluster key
cluster key
a column that is available in all interleaved tables and determines the order in which rows are interleaved
Why are table clusters not commonly used?
They are only optimal for joining interleaved tables on the cluster key and perform poorly on many other queries.
single-level index
a file containing column values, along with pointers that identify either the block containing the row or the exact location of the row within the block
multi-column index
an index where each entry is a composite of values from all indexed columns
table scan
a database operation that reads table blocks directly, without accessing an index
index scan
a database operation that reads index blocks sequentially, in order to locate the needed table blocks
What are the two database operations that may be used to execute a SELECT query?
table scan or index scan
hit ratio/filter factor/selectivity
the percentage of table rows selected by a query
If hit ratio is __, a database will perform a table scan.
high
If hit ratio is __, a database will most likely use an index scan.
low
To estimate hit ratio, a database examines the __ of a SELECT query.
WHERE clause
What is the formula to calculate the number of blocks read in a binary search?
log2 N blocks
primary/clustering index
an index on a sort columns
secondary/nonclustering index
an index that is not on the sort column
All indexes of a heap or hash table are __, since they have no sort column.
secondary
dense index
contains an entry for every table row
sparse index
contains an entry for every table block
Why do secondary indexes have to be dense?
Because the data in the index column is not sorted, and therefore the first value of each block is meaningless for search purposes.
Database designers usually create a primary index on the __ of large tables.
primary key
What is the difference between a clustering index and a cluster key?
A cluster key refers to the column used to sort rows in a table cluster structure and is not an index.
In some database systems, primary and clustering indexes are indexes on __ sort columns, respectively.
unique and non-unique
Block splits and merges occur __ often for inserts, updates, and deletes on dense indexes than sparse ones.
more
How does a database with a dense index deal with deletes?
By marking the row as ‘deleted’ at the time and periodically removing deleted rows in batches.
multi-level index
a form of index that stores column values and row pointers in a hierarchy with each level more sparse than the one below it
Is the bottom level of a multi-level index sparse or dense?
Depends on whether the index column is primary or secondary. A secondary index column requires a dense index.
fan-out
the number of index entries per block in a multi-level index
Formula to calculate the number of levels in a multi-level index, assuming minimal free space
logfan-out(number of rows) = dense index # of levels
logfan-out(number of rows / rows per block) = sparse index # of levels
branch
each path from the top-level block to a bottom level block in a multi-level index
What does it mean if a multi-level index is balanced?
All of the branches are the same length
An insert into a table with a __ index without free space will always generate a new index level.
dense
What is the difference between a B+tree and a B-tree?
Index values may repeat between levels of a B+tree, which allows for pointers to table blocks only appearing in the bottom level. A B-tree does not repeat values between levels, and table pointers may appear at any level. B-trees are more compact, but harder to implement because table updates may cause table pointers to shift levels in the index.
hash index
an index where entries are assigned to buckets, similar to a hash table
Explain the difference between a hash index and a hash key.
A hash index is an index that is structured using a hash function while a hash key is a column that determines the physical location of rows in a hash table.
bitmap index
a grid of bits where each index row corresponds to a unique table row and each index column corresponds to a distinct value within the indexed table column
What does a 1 indicate in a bitmap index?
That the table row corresponding to the index row number contains the table value corresponding to the index column number.
A database reads the __ of a bitmap index first to find a value’s table location.
columns
What are the characteristics of an efficient bitmap index?
The database can quickly determine the block containing a table row based on the index row number.
The indexed column contains only tens or hundreds of distinct values to search.
physical index
a single- or multi-level index that contains pointers to table blocks
logical index
a single- or multi-level index in which pointers to table blocks are replaced with primary key values, which requires a separate primary index on the same table
How does a database locate a row containing a column value using a logical index?
Looks up the column value in the logical index to find the primary key value.
Looks up the primary key value in the primary index to find the table block pointer.
Reads the table block containing the row.
What is the advantage of a logical index?
Logical indexes change only when primary key values are updated, which is much more rare than a change to physical index.
function index
an index where the database designer specifies a function on the column value and entries contain the result of the function applied to the column values
When is a function index useful?
When the values specified in the WHERE clause of queries are liable to be in a different format or unit than the values stored in the table.
tablespace
a database object that maps one or more tables to a single file
What is the syntax to create a tablespace in MySQL?
CREATE TABLESPACE TablespaceName
[ADD DATAFILE 'FileName'];
What is the syntax to assign a table to a tablespace?
At the end of a CREATE TABLE statement:
[TABLESPACE TablespaceName]
By default, most databases __ create one tablespace for each table.
automatically
Under what circumstances does it improve database performance to store multiple tables in one tablespace?
When the tables stored together are small to reduce overhead, commonly accessed in the same query to improve query performance, and/or they are read-only so assigning them to the same tablespace does not increase disk fragmentation.
partitions
subsets of table data that do not overlap with one another and are stored in separate tablespaces, specified either explicitly by the DBA or automatically by the database
Most relational databases partition tables __.
horizontally
range partition
a method of partitioning that associates each partition with a range of partition expression values
list partition
a method of partitioning that associates each partition with a set of partition expressions using the VALUES IN keywords
hash partition
a method of partitioning where the database administrator specifies the number of partitions, N, and the partition number for each row is then computed as: (partition expression value) modulo N.
key partition
a type of hash partition where the partition expression is determined automatically by the database
What aspect of queries does physical design impact?
performance
storage engine/storage manager
translates instructions generated by a query processor into low-level commands that access data on storage media
InnoDB
a storage engine with full support for transaction management, foreign keys, referential integrity, and locking
The default storage engine installed with the MySQL download is __.
InnoDB
What aspect of queries does logical design impact that physical design does not?
results
MyISAM
a storage engine with limited transaction management and locking capabilities that is often used for analytic applications with limited data updates
MEMORY
a storage engine that stores all data in main memory and is used for fast access with databases small enough to fit in main memory
What kinds of table structures does Oracle Database support that MySQL with InnoDB does not?
hash and cluster
Which storage engine allows MySQL to support both B+tree and hash indexes?
MEMORY
Specifying the index type for a table is a part of __ design.
physical
In MySQL with InnoDB, a table with a primary key will have a __ structure.
sorted
In MySQL with InnoDB, a table with no primary key will have a __ structure.
heap