1/98
Professor Dudenhoffer
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
Databases are stored in a collection of files. Each file is a sequence of records and a record is a sequence of fields.
How do files, fields, and records relate to how databases are stored?
4 kb
How much storage does a single node have?
100 keys/pointers
How many keys or pointers does a single block hold?
1 block
How many blocks can be held in a single node?
n * (i - 1)
For fixed-length records, what equation is used to store the record i?
Move records i+1 until n-1
move record n to i
do not move the records, link all unused records on a free list
Briefly describe the different ways to delete record i from fixed length record model. Where records is i and the total number of records is n.
A list of pointers that point to the location of a free record location.
How would you simply describe a free list?
Slotted page
This type of variable length record structure contains the number of record entries, end of free space in the block, and the location/size of each record. Records are moved around so there is no empty space between them and pointers point to the entry for the record in the header.
Heap
This allows a record to be placed anywhere in the file where there is space available.
Squential
This allows a record to be stored in a specific order based on the value of the record’s search key.
Hashing
This utilizes a computed function onto an attribute of a record. The result of the function determines where the record is placed.
Multitable clustering file organization
This allows records of several different relations to be stored in the same file and minimizes I/O time. This is unlike the other organization of records in files, where the records of each relation are stored in different files.
sequential processing, search-key, pointer chains, insertion (locate where record should be inserted, if there’s space proceed and update pointer chain, else insert into an overflow block and update pointer chain.)
For sequential file organization:
What type of processing is it best used for?
How are the records stored/ordered by?
What do you use when deleting a record?
How do you insert?
Queries involving the joining of two relations, queries involving only one relation
For multitable clustering file organization
What types of queries is it useful for
What types of queries would make it useless?
Data dictionary (AKA system catalog)
This type of storage stores metadata (data about data) such as…
information about relations (relation names, types,
user and accountability information (such as passwords)
statistical and descriptive data
physical file organization information (how it’s stored/physical location of relation)
information on indices
Blocks
What are the fixed-length storage units called that a file is partitioned into?
Buffer
This portion of main memory is available to store copies of disk blocks.
Buffer manager
This subsystem is responsible for allocating buffer space in main memory.
When a block from disk is needed. If the block is in the buffer the manager returns the address of block in main memory. If it is not in the buffer, the manager will allocate space for it and then read the block from the disk to the buffer and return the address.
When do programs call on buffer managers? How does it work?
Least recently used
What is the main strategy for buffer replacement?
Pinned block, Toss Immediate, Most recently used
These are different buffer replacement policies:
The main memory block isn’t allowed to be written back to disk.
As soon as the final tuple of the block was processed, the space is freed
The block currently being processed is pinned to the top.
Search Key
This is an attribute or set of attributes used to look up a particular record in a file.
Search Key Value
This is the value that is being searched for in a file.
Index entries
What are the records in an index file called?
Index file
This type of file is smaller than the original file and only hold the search key value and pointer.
Ordered indices
In an index file, this type of set up mean the indexes are stored in a sorted order.
Hash indices
In an index file, this type of set up means the search key values are distributed uniformly across buckets using a function.
Access time, insertion time, deletion time
In regards to index evaluation metrics:
The time to retrieve a record
The time to add a new record, taking into account the time to update the index
The time to remove a record, taking into account the time to update the index.
Primary index (clustering index)
When records are stored in sorted order of one of the attributes, this type of index will point to the attribute that the table is sorted by.
Secondary index (non-clustering index)
When records are stored in sorted order of one of the attributes, this type of index will point to an attribute that the table is not using to be sorted. This type of index must be Dense.
Dense index
This is when every search-key value is represented in the index.
Sparce index
This is when not all search-key values are represented in the index and only apply to the primary indices.
Treat the primary index as a sequential file and construct a sparce index on it. Called multi-level index.
What should we do if a primary index does not fit in memory and access becomes expensive?
outer index
When treating primary indices as a sequential file, this is the sparce index of a primary index.
inner index
When treating primary indices as a sequential file, this is the primary index file.
Same as file record deletion (3 methods)
For a single level index entry with dense indices, how would you perform deletion?
If the entry for the search key exists in the index, delete it and check if the next search-key value is in the index. If it is not in the index, replace the deleted entry with the next search-key value. If it is already in the index, replacing isn’t necessary.
For a single level index entry with sparce indices, how would you perform deletion?
Using the search-key value, look to see if it’s already in the index. If it’s not, insert.
For single level indexes, how does insertion work for dense indices?
Using the search-key value, look to see if it’s already in the index. If the index stores an entry for each block of the file, no change needed unless a new block is created. If the new block is created, then the first search-key value appearing in the new block is inserted into the index.
For single level index, how does insertion work for sparce indices?
Reorganizes itself with small changes
What is the main advantage of using a B+ tree?
space overhead for insertion and deletion.
What is the main disadvantage of B+ trees?
Performance degrades when the file gets bigger and reorganization is required periodically. B+ trees
What are the disadvantages of index sequential files? What’s the alternative?
Performs small reorganization on its own/doesn’t need to be completely reorganized to maintain performance. Insertion/deletion overhead.
What’s the advantage of using B+ trees? What’s the main (minor) disadvantage?
All paths from root to leaf are same length
Each node (not root/leaf) has between n/2 and n children.
A leaf node has between (n-1)/2 and n-1 values
What properties must a B+ tree satisfy?
Logn/2(K)
The height of a B+ tree is no more than what equation?
Find which leaf node the search-key value would appear
If value is already there: add record to file and pointer to the bucket
If value isn’t there: add record to main file (create bucket if needed)
If there is room in leaf node, insert. Otherwise, split node and place first n/2 in original and the rest in new node. Propagate upwards if necessary
How do you insert in a B+ tree?
Extendable hashing
This is a type of dynamic hashing that generates values over a large range of 32-bit.
2i
Assume that for extendable hashing, the length of the prefix is i bits. How can we determine the bucket address table size?
Locate the bucket using hash function, check if there’s room. If there is room insert, if there isn’t room split and insert.
How do you insert into an extendable hash structure?
extra level of indirection to find record, address tables may become very large, changing size of bucket address table is expensive
What are the 3 main disadvantages of extendable hashing?
Linear hashing, B+ trees
What are the two alternatives of extendable hash tables?
better option: Retrieving records that have value of key
hindrance: Range queries
When is hashing the better option? When is it more of a hindrance?
PostgreSQL
This type of database software supports hash indices, but discourages use due to poor performance.
Oracle
This type of database software supports static hash organization, but not hash indices.
SQLServer
This type of database software supports only B+ trees.
create index, create unique index, drop index
What are the three main SQL index commands? (Mentioned in Chapter 11)
To find the evaluation plan with the lowest cost
What is the main goal of query optimization?
Cost
This is the elapsed time to answer the query.
Disk access time
What is the most driving factor of query cost?
BT + Ss
T is the time to transfer one block. S is the time for a single seek. How do we calculate the cost for block transfers B plus seeks S?
Linear Search (A1)
This algorithm is considered the “file scan” algorithm when it comes to finding the cost for a selection operation. It will scan each file block and test all records to see whether they satisfy the selection condition.
br + 1s
What is the cost estimate for a selection operation using linear search?
primary index, equality on key
This algorithm is considered one of the two “index scan” algorithms when it comes to selection operations using indices. It retrieves a single record that satisfies the corresponding equality condition.
(h + 1) * (T + S)
What is the cost for a primary index, equality on key algorithm for selections using indices?
primary index, equality on nonkey
This algorithm is considered one of the two “index scan” algorithms when it comes to selection operations using indices. It retrieves multiple records.
h (T + S) + S + T * B
What is the cost equation for the primary index, equality on nonkey algorithm for selections using indices?
Nested Loop Join
This algorithm helps to estimate cost for a join operation. It compares every tuple of one relation (outer) with every tuple of another relation (inner).
For each tuple in the outer relation, it will scan all the tuples in the inner relation.
Take the total number of tuples in the outer relation and multiply it by the total number of blocks in both relations. Add that to the number of seeks (total number of tuples in outer added to the total blocks of outer).
How do you calculate the worst case cost for the nested-loop join algorithm? Make sure to take into account block transfers and seeks.
Take the total number of blocks in both relations and two seeks
What is the best case cost for the nested-loop join algorithm?
If the smaller relation fits entirely in memory, use that as the inner relation.
What is a rule of thumb to remember when estimating cost using the Nested-Loop Join algorithm? Hint: this has to do with choosing which relation is inner and which is outer.
Block Nested-Loop Join
In regards to estimating the cost of the join operation, this algorithm improves another algorithm by comparing blocks of tuples instead of one tuple at a time.
For each block of the outer relation, scan all the blocks of the inner relation.
Take the number of blocks in the outer relation and multiply that to the sum of all the blocks (both outer and inner). Then add it to double the amount of blocks from the outer relation.
(br * bs + br) + 2*br
How do you calculate the worst case estimate for the block nested-loop join algorithm?
Sum up the total number of blocks in both relations and add it to 2 seeks
How do you calculate the best case estimate for the block nested-loop join algorithm?
Indexed Nested-Loop Join
In regards to estimating the cost of the join operation, this algorithm uses an index on the inner relation to avoid scanning it fully.
For each tuple in the outer relation, use an index on the inner relation to find matching tuples.
Sum up the cost for one block transfer and one seek, multiply that by the total number of blocks in the outer relation. Add that to the number of tuples in the outer relation and then multiple all of it by the cost of an index lookup on the inner relation.
How do you estimate the worst case cost using the index nested-loop join algorithm?
Choose the relation that doesn’t use indices.
If both relations use indices, make the relation with the lesser number of tuples be the outer relation.
What is the general rule of thumb for choosing the outer relation for the Indexed Nested-Loop Join algorithm?
Transaction
This is a unit of program execution that accesses and possibly updates various data items.
Atomicity, Consistency, Isolation, Durability
What are the properties of ACID?
Atomicity
This is a property of ACID ensures a system can go back to its original state in the case that a transaction didn’t finish.
Durability
This is a property of ACID that ensures updates continue even if it fails the first time. Eventually, it will finish.
Consistency
This property of ACID ensures the transaction sees a database that won’t change unexpectedly.
Explicit and implicit integrity constraints
What are the requirements for the consistency property in ACID transactions?
primary key or foreign keys
sum of A and B unchanging after transaction execution
In regards to the Consistency requirements in ACID properties, what is an explicit integrity constraint? What about implicit?
Isolation
This property of ACID ensures that multiple concurrent processes can happen at the same time.
Recovery system, Programmers, Concurrency control system, recovery system
Who is responsible for the ACID properties? Go in this order: Atomicity, Consistency, Isolation, Durability.
Active, partially committed, failed, aborted, committed
To help think about Atomicity and Durability, what are the 5 states of the transaction state diagram?
Active
In the transaction state diagram, this state is where the transaction stays while it’s executing.
Partially committed
In the transaction state diagram, this state is where the transaction is after the final statement has been executed.
Failed
In the transaction state diagram, this state is where the transaction lands after the discovery that normal execution can no longer proceed.
Aborted
In the transaction state diagram, this state rolls back the transaction and restores the database to its prior state before the start of execution. From there, the transaction can either be restarted or killed.
Committed
In the transaction state diagram, this state means execution was successful and finished.
Logical error, System error, System Crash, Disk failure
What type of failure is it?
A transaction cannot complete because of some internal error condition.
An active transaction is stopped due to an error condition such as deadlock.
A power failure or some other hardware/software failure.
A head crash or similar destroys all or parts of a disk storage
Fail-Stop Assumption
This claims that non-volatile storage contents are assumed to not be corrupted as a result of system crash.
Redo and Undo phase
Action taken during the normal transaction processing, action taken after a failure to recover.
What are the two parts of a recovery algorithm?
Volatile, non-volatile, stable
What are these storage structures?
Doesn’t survive system crashes (RAM)
Survives a system crash but may still fail and lose data (disk)
Perfect storage that survives all failures.
Maintain multiple copies of each block on separate disks
How is stable storage implemented?
successful completion, partial failure - destination block is updated but incorrect, total failure - destination block was never updated
What are the 3 ways a block transfer can result in? Give a brief description of each.
Write the information onto first block. Once complete write same info onto second block. Once complete, the output is complete.
What are the 3 steps to attempt to protect storage media from failure during data transfer?
In the case that