D426: Data Management Foundations Ch 5 - Database Design

0.0(0)

Studied by 1 person

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/110

Earn XP

Description and Tags

Flashcards for chapter 5 of WGU's Data Management Foundations course.

sql

database management

data science

data engineering

database design

data management

WGU

D426

data management foundations

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

111 Terms

New cards

What are four dimensions on which computer media can vary?

Speed
Cost
Capacity
Volatility

New cards

access time

the time required to access the first byte in a read or write operation

New cards

transfer rate

the speed at which data is read or written, following initial access

New cards

volatile memory

memory that is lost when disconnected from power

New cards

non-volatile memory

memory that is retained without power

New cards

main memory/RAM

the primary memory used when computer programs execute, which is fast, expensive, and has limited capacity

New cards

flash memory/SSD

a type of memory stored in pages of 2 - 16 kb. Writes are slower than reads.

New cards

magnetic disk/HDD

a type of memory stored in sectors on a track, traditionally 512 bytes, but more often 4 kb on newer formats

New cards

block

a uniformly sized unit used by databases and file systems when transferring data between main memory and storage media, which is usually anywhere between 2 and 64 kb

New cards

storage controllers

a portion of a storage system internal to the storage device that converts data between blocks and sectors or pages

New cards

Smaller block sizes are better for __, which access a few rows per query.

transactional applications

New cards

Larger block sizes are better for __, which access many rows per query.

analytical applications

New cards

row-oriented storage

a form of storage that stores an entire row in one block to minimize block transfers during transactional applications

New cards

Row-oriented storage performs best when row size is small relative to __.

block size

New cards

Why does row-oriented storage perform best with rows that are small in relation to the block size?

Improved query performance due to more rows per block.
Less wasted storage due to rows fitting more evenly into available space.

New cards

For tables with large columns, each row usually contains a link to the large column, which is stored __.

in files managed by the operating system or in a special storage area managed by the database.

New cards

How does column-oriented storage benefit analytic applications?

Faster access to an entire column.
Better data compression when all values have the same data type.

New cards

PostgreSql and Vertica

examples of relational databases that support column-oriented storage

New cards

heap table

a table structure where no order is imposed on rows and instead the database maintains a list of blocks assigned to the table, along with the address of the first available space for inserts

New cards

What type of table structure is best for optimizing insert operations and bulk loading rows?

heap table

New cards

Heap tables are not optimal for __.

queries that read rows in a specific order

New cards

sorted table

a table structure where the database designer identifies a sort column, usually the primary key, that determines physical row order on storage media

New cards

Sorted tables are optimal for queries that __.

read data in order of the sort column

New cards

How does a sorted table structure handle an attempt to insert a row into a full block?

The block splits in two and half the rows are moved to a new block.

New cards

hash table

a table structure where rows are assigned to buckets

New cards

How is the bucket containing each row in a hash table determined?

By a hash function and a hash key.

New cards

hash key

a column or group of columns used to determine the bucket that each row of a table is stored in

New cards

hash function

computes the bucket containing the row from the hash key and is designed to scramble row locations and evenly distribute rows across blocksm

New cards

What are the four steps of the modulo function?

Convert the hash key by interpreting the key’s bits as an integer value.
Divide the integer by the number of buckets.
Interpret the division remainder as the bucket number.
Convert the bucket number to the physical address of the block containing the row.

New cards

What is the disadvantage of a fixed hash function in a database that is planned to grow over time?

A fixed hash function simply allocates more rows to each bucket as tables grow, resulting in deep buckets with many blocks that a query may have to read in order to access a single row.

New cards

dynamic hash function

automatically allocates more blocks to a growing table, creates additional buckets, and distributes rows across all buckets to avoid deep buckets

New cards

Hash tables are optimal for __.

inserts and deletes of individual rows

New cards

Hash tables are slow on queries that select __.

many rows with a range of values

New cards

table clusters/multi-tables

tables that interleave rows of two or more tables in the same storage area using a cluster key

New cards

cluster key

a column that is available in all interleaved tables and determines the order in which rows are interleaved

New cards

Why are table clusters not commonly used?

They are only optimal for joining interleaved tables on the cluster key and perform poorly on many other queries.

New cards

single-level index

a file containing column values, along with pointers that identify either the block containing the row or the exact location of the row within the block

New cards

multi-column index

an index where each entry is a composite of values from all indexed columns

New cards

table scan

a database operation that reads table blocks directly, without accessing an index

New cards

index scan

a database operation that reads index blocks sequentially, in order to locate the needed table blocks

New cards

What are the two database operations that may be used to execute a SELECT query?

table scan or index scan

New cards

hit ratio/filter factor/selectivity

the percentage of table rows selected by a query

New cards

If hit ratio is __, a database will perform a table scan.

high

New cards

If hit ratio is __, a database will most likely use an index scan.

low

New cards

To estimate hit ratio, a database examines the __ of a SELECT query.

WHERE clause

New cards

What is the formula to calculate the number of blocks read in a binary search?

log₂ N blocks

New cards

primary/clustering index

an index on a sort columns

New cards

secondary/nonclustering index

an index that is not on the sort column

New cards

All indexes of a heap or hash table are __, since they have no sort column.

secondary

New cards

dense index

contains an entry for every table row

New cards

sparse index

contains an entry for every table block

New cards

Why do secondary indexes have to be dense?

Because the data in the index column is not sorted, and therefore the first value of each block is meaningless for search purposes.

New cards

Database designers usually create a primary index on the __ of large tables.

primary key

New cards

What is the difference between a clustering index and a cluster key?

A cluster key refers to the column used to sort rows in a table cluster structure and is not an index.

New cards

In some database systems, primary and clustering indexes are indexes on __ sort columns, respectively.

unique and non-unique

New cards

Block splits and merges occur __ often for inserts, updates, and deletes on dense indexes than sparse ones.

New cards

How does a database with a dense index deal with deletes?

By marking the row as ‘deleted’ at the time and periodically removing deleted rows in batches.

New cards

multi-level index

a form of index that stores column values and row pointers in a hierarchy with each level more sparse than the one below it

New cards

Is the bottom level of a multi-level index sparse or dense?

Depends on whether the index column is primary or secondary. A secondary index column requires a dense index.

New cards

fan-out

the number of index entries per block in a multi-level index

New cards

Formula to calculate the number of levels in a multi-level index, assuming minimal free space

log_fan-out(number of rows) = dense index # of levels
log_fan-out(number of rows / rows per block) = sparse index # of levels

New cards

branch

each path from the top-level block to a bottom level block in a multi-level index

New cards

What does it mean if a multi-level index is balanced?

All of the branches are the same length

New cards

An insert into a table with a __ index without free space will always generate a new index level.

dense

New cards

What is the difference between a B+tree and a B-tree?

Index values may repeat between levels of a B+tree, which allows for pointers to table blocks only appearing in the bottom level. A B-tree does not repeat values between levels, and table pointers may appear at any level. B-trees are more compact, but harder to implement because table updates may cause table pointers to shift levels in the index.

New cards

hash index

an index where entries are assigned to buckets, similar to a hash table

New cards

Explain the difference between a hash index and a hash key.

A hash index is an index that is structured using a hash function while a hash key is a column that determines the physical location of rows in a hash table.

New cards

bitmap index

a grid of bits where each index row corresponds to a unique table row and each index column corresponds to a distinct value within the indexed table column

New cards

What does a 1 indicate in a bitmap index?

That the table row corresponding to the index row number contains the table value corresponding to the index column number.

New cards

A database reads the __ of a bitmap index first to find a value’s table location.

columns

New cards

What are the characteristics of an efficient bitmap index?

The database can quickly determine the block containing a table row based on the index row number.
The indexed column contains only tens or hundreds of distinct values to search.

New cards

physical index

a single- or multi-level index that contains pointers to table blocks

New cards

logical index

a single- or multi-level index in which pointers to table blocks are replaced with primary key values, which requires a separate primary index on the same table

New cards

How does a database locate a row containing a column value using a logical index?

Looks up the column value in the logical index to find the primary key value.
Looks up the primary key value in the primary index to find the table block pointer.
Reads the table block containing the row.

New cards

What is the advantage of a logical index?

Logical indexes change only when primary key values are updated, which is much more rare than a change to physical index.

New cards

function index

an index where the database designer specifies a function on the column value and entries contain the result of the function applied to the column values

New cards

When is a function index useful?

When the values specified in the WHERE clause of queries are liable to be in a different format or unit than the values stored in the table.

New cards

tablespace

a database object that maps one or more tables to a single file

New cards

What is the syntax to create a tablespace in MySQL?

CREATE TABLESPACE TablespaceName
[ADD DATAFILE 'FileName'];

New cards

What is the syntax to assign a table to a tablespace?

At the end of a CREATE TABLE statement:

[TABLESPACE TablespaceName]

New cards

By default, most databases __ create one tablespace for each table.

automatically

New cards

Under what circumstances does it improve database performance to store multiple tables in one tablespace?

When the tables stored together are small to reduce overhead, commonly accessed in the same query to improve query performance, and/or they are read-only so assigning them to the same tablespace does not increase disk fragmentation.

New cards

partitions

subsets of table data that do not overlap with one another and are stored in separate tablespaces, specified either explicitly by the DBA or automatically by the database

New cards

Most relational databases partition tables __.

horizontally

New cards

range partition

a method of partitioning that associates each partition with a range of partition expression values

New cards

list partition

a method of partitioning that associates each partition with a set of partition expressions using the VALUES IN keywords

New cards

hash partition

a method of partitioning where the database administrator specifies the number of partitions, N, and the partition number for each row is then computed as: (partition expression value) modulo N.

New cards

key partition

a type of hash partition where the partition expression is determined automatically by the database

New cards

What aspect of queries does physical design impact?

performance

New cards

storage engine/storage manager

translates instructions generated by a query processor into low-level commands that access data on storage media

New cards

InnoDB

a storage engine with full support for transaction management, foreign keys, referential integrity, and locking

New cards

The default storage engine installed with the MySQL download is __.

InnoDB

New cards

What aspect of queries does logical design impact that physical design does not?

results

New cards

MyISAM

a storage engine with limited transaction management and locking capabilities that is often used for analytic applications with limited data updates

New cards

MEMORY

a storage engine that stores all data in main memory and is used for fast access with databases small enough to fit in main memory

New cards

What kinds of table structures does Oracle Database support that MySQL with InnoDB does not?

hash and cluster

New cards

Which storage engine allows MySQL to support both B+tree and hash indexes?

MEMORY

New cards

Specifying the index type for a table is a part of __ design.

physical

New cards

In MySQL with InnoDB, a table with a primary key will have a __ structure.

sorted

100

New cards

In MySQL with InnoDB, a table with no primary key will have a __ structure.

heap