1/101
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Database
An integrated collection of data with structure
Gigabyte
230
Terabyte
240
Petabyte
250
Exabyte
260
Zettabyte
270
DBMS
Database Management System
Views (External Schema)
How users see and access data
(Many views)
Conceptual Schema
Logical structure of stored data
(One conceptual schema)
Physical Schema
How data is organized on disk
(One physical schema)
DBMS Developer
Designs and implements DBMS software
Database Administrator
Designs schemas for DB, handles security and authorization, data ability, crash recovery, tuning
Database User
Interacts with DMBS UI, query language or GUI
DBMS Architecture
UI (SQL), Query Optimization and Execution, Relational Operators, Files and Access Methods, Buffer Manager (moves pages to/from main mem), Disk Space Management, DB
Record
One instance of data/associated data
i.e. name, email, address
Stored in a page then file
File
Stores pages of records
Needs to keep track of pages in file, first page, next page, free space on pages, etc.
Page
stores records, stored in file
Disk HDD Pros/Cons
Pros:
Cheaper
No write wear out
Larger capacity
Cons:
Slower
Disk Sector
Disk pizza slice
Disk Track
Disk donut slice
Disk Block
Section of disk at (track, sector)
Seek Time
Time for disk arm to move to track
Rotational Latency
Time for sector to rotate around
(End of sector back to beginning of sector so < 1 rot)
Transfer Time
Time to R/W data to a data block
Time to R/W to Disk
Seek time + rotational latency + transfer time
Disk Address of a Page
Surface #, Track #, Sector #
Parallel Disk Storage
Improves R/W times, N pages can be R/W as fast as 1
RAID 0
Data striping over disks
Increased likelihood of a data block failing
Striping
Pages of files split into datablocks across multiple disks
RAID 1
Mirroring
Duplicate datablocks on another disk as backup
Reads can be done in parallel but writes cannot
RAID 0+1
Striping and Mirroring
D reads in parallel
D/2 writes in parallel
Each disk/mirror pair can fail once
Half of total disk capacity must be for mirrors
RAID 4
Parity page for each stripe
Parity page stores redundant info of each page
Parity page must be rewritten with ever page update
D-1 data pages per stripe
Parity Bit
Can tolerate up to one bit failure, otherwise not possible to determine which bit failed in what way
Parity bit must be updated with every page update by XOR-ing old page, new page, and old parity bit
RAID 5
Distributed Parity
Put parity pages on different disks
Won’t fail the whole drive if parity disk fails
Multiple page updates take less time because the wait for a page to finish before updating the parity page on the same disk takes less time than if all parity pages are on one disk
More even data distribution
Buffer Manager
Gets and returns pages from database
Buffer Pool
Divided into page-sized Frames
Pages are brought in to be used and stored in Frames
Frames
Page-sized chunks of memory that can be R/W
Dirty Bit
Signifies whether or not a page has been edited
If true, page must be rewritten to disk before deallocation
Valid Bit
Signifies whether or not a page is “valid”
Can only be 0 if pin count is 0
Replacement Policy
Algorithm used to pick which page in BP to replace upon page fault
Examples: FIFO, MRU, LRU, Clock, Random
Sequential Flooding
Repeatedly having to replace frames that were just evicted
Worst case scenario for LRU
For n pages and n-1 size BP, pages 1 to n-1 are loaded into the BP, being pin/unpinned, then page 1 is evicted in favor of page n, then page 2 must be evicted when page 1 is requested again , 3 evicted upon request for 2, etc.
Clock Replacement
Better replacement algorithm (approx of LRU)
Adds refbit for each page which must be 0 for the page to be replaced
Refbit is set to 1 when pin count is set to 0
Basically figures out which valid, unpinned page can be replaced
Adds clock hand that searches through each page to find a page for replacement in loops
Implementing LRU and MRU
Not really worth it because an ordered list of historical page access needs to be constantly updated with each change in pin count
Costs time, computing power, memory
Improving Replacement Methods
Prefetching Pages - Replacing pages before they are requested based on historical patterns
Keeping free list empty as often as possible
Clean pages - Don’t wait to replacement to write and deallocate pages
Why not use OS to manage Memory?
Need to PIN a page in memory
Need to force page to be written to disk (part of support for transactions)
DBMS workload has well-known reference patterns, not always good fit with general OS policies
MRU is terrible for general purpose OS
MRU works well for some DBMS access patterns
ER Model
Expressive Model
Diagram showing relationships between Entities, Entity Sets, Attributes
Provides abstraction
Entity
A real world object
An instance of something
Entity Set
A collection of similar items
A set of Entities
Attributes
Set of identifying characteristics
Ex: Age, Height, Weight, Eye Color
Key
Minimal subset of attributes that uniquely id an entity from entity set
Sometimes an ID attribute (ID num) is added just to use as an ID
Candidate Key
Any attribute (or combination of keys?) that could be a key (i.e. unique id)
Primary Key
Designated unique id
Relationship
Association among two or more entities
Can have attributes in ER diagrams (i.e. manage “SINCE”)
Can be represented as a separate table joining two other tables
Grover appears on Sesame Street
Relationship Set
Collection of similar relationships
Muppets appear on TV Shows
Mappings (Relationships)
N-ary relationships
1 to 1
1 to many
Many to many
Key Constraint
An entity entity can participate at most one time in a relationship set
1 x per y (1 to many)
Bold arrow
Participation Constraint
Each entity in an entity set must participate in the relationship
Bold line
Aggregation
Using relationships to group together related records from multiple tables and display as one
Treating a relationship set as an entity set
Relational Database
Collection of Relations
Relation
Schema + TableS
Schema (Relational DB)
Name of relation + fields/domains
Field/Attributes (Relational DB)
Columns in a table
Domain (Relational DB)
Type
Instances/Tuples/Records (Relational DB)
Rows in a table
Cardinality (Relational DB)
Number of rows
Arity/Degree (Relational DB)
Number of columns
Instance of a Relation
Represented by a table
A table shows related records?
The set of attributes grouped together
Instance
Tuple/Record/Row
SQL
Structured Query Language
Integrity Constraint
Condition that must be true for every instance in the relation
Foreign Key
Set of fields of one relation used to refer to tuple in another relation
Must be a candidate key in source (refers to a unique tuple in another relation) (and must exist in source)
Referential Integrity
Foreign key must exist in source relation
ON DEL/UPDT CASCADE (SQL)
Del/Updt field value here too
Change the associated foreign keys or delete the record
ON DEL/UPDT NO ACTION (SQL)
If trying to del/update a record that other records reference with a foreign key, disallow
ON DEL/UPDT SET DEFAULT (SQL)
If trying to del/update a record that other records reference with a foreign key, set the foreign keys to a specified default value
ON DEL/UPDT SET DEFAULT (SQL)
If trying to del/update a record that other records reference with a foreign key, set the foreign keys to a pre-specified default value
View
Relation whose rows and columns are not necessarily stored in the DB
A way of visually organizing, associating, or filtering data together by specific selections to show association
Fixed-Length Records
All records in a page are the same length
All records have same attributes at same offsets, attribute offset info can be stored in meta data
Fixed qty of records per page
Variable-Length Records
Records can differ in length
Fixed number of fields
Field size can vary
Can separate fields with delimiters or stored offsets/pointers
Packed Page (w/ Fixed Length Records)
Page of records where all records are compacted so there is a section of continuous records then continuous free space
Fixed number of records per page based on record size
On delete, the last record is moved to take the deleted records place essentially compacting the records
“Fast and easy insert”
Unpacked Page (w/ Fixed Length Records)
Page of records where records do not compact, and there is a slot directory to keep track of record location and validity
Fixed number of records per page base on record and slot size
Records do not need to be compacted on delete, slot directory just has to update
Heappage
Slotted Page for Variable Length Records
Don’t know capacity of page due to differing record sizes
Slot offsets are not fixed
Requires compaction
Need to maintain offset of each record, length of each record, free space start and end
Double Linked List Heapfile
Two double linked lists for keeping track of pages in the file, one list for full pages, one list for pages with available space
Uses one page as header page with pointers to beginning of both lists and other metadata
More efficient inserts/deletes
Directory of Pages Heapfile
Uses header pages of file to store ordered pointers to individual pages of records, may require multiple header pages
Faster file access if specific page id is known
Heapfile Scan Time Complexity
O(B * D)
B = # of pages, D = I/O Cost
Heapfile Equality Search Time Complexity
O(½ * B * D)
B = # of pages, D = I/O Cost
½ because average number of pages
Heapfile Range Search
O(B * D * N)
B = # of pages, D = I/O Cost, N + # of pages in range
Heapfile Insert Time Complexity
O(2 D) for two double linked lists, O(2 * B * D) for page directory
B = # of pages, D = I/O Cost
2 because header must also be updaed
Heapfile Delete Time Complexity
O(½ * B * D)
B = # of pages, D = I/O Cost
Essentially cost to search then update page and header
Sorted File Scan Time Complexity
O(B * D)
B = # of pages, D = I/O Cost
Sorted File Equality Search Time Complexity
O(D * logB)
B = # of pages, D = I/O Cost
Sorted File Range Search Time Complexity
O(D * logB + N)
B = # of pages, D = I/O Cost, N = # of pages in range
Sorted File Insert Time Complexity
O(D * logB + 2 * (½ * B * D)
B = # of pages, D = I/O Cost
Search for page, insert into page, then shift (reinsert) half the records
Sorted File Delete Time Complexity
O(D * logB + 2 * (½ * B * D)
B = # of pages, D = I/O Cost
Same as insert
DBMS File Layer
Manages DB Relation and Index Files
Acts as a File Interface
Organizes Files
B+ Tree of Auxiliary Indices
B+ tree where search keys of non-leaf nodes are fields of records and leaf nodes contain pointers to records
B+ Tree of Files
B+ tree where search keys of non-leaf nodes are primary keys of records and leaf nodes contain files