Databases

0.0(0)

Studied by 0 people

0.0(0)

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/37

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

38 Terms

New cards

Object Relational Databases

databases where specific data types are more naturally represented in original form (ex: X-ray scans)

New cards

Create Type

CREATE TYPE complex AS (r double precision, r double precision)

New cards

Create FUNCTION

CREATE FUNCTION AS (complex, complex)

New cards

CREATE Operator

Create operator + (leftarg = complex rightarg = complex function = complex_add commutator = +)

New cards

CREATE aggregate average (complex)

CREATE AGGREGATE average (complex) (sfunc = complex_add stype = (complex, integer) finalfunc = divide_complex_by_int init_cond = (0,0)

New cards

Text in postgres

Select 'a fat cat sat on a mat and ate a fat rat'::tsvector@@'cat & rat'::tsquery

New cards

Text queries

& AND, | OR, ! NOT, <-> next-to (

New cards

Ranking in search

TF(k) → normalized by length

IDF(k) → normalized by how discriminating keywords are

TF * IDF

New cards

Memory Hierarchy

First 3 are volatile (quickest also)

New cards

SSD

solid state drive

states represented as electron voltage

low charge = 1, high charge = 0

charges can be freely sensed in read

pristine page = all 1s

writing from 1 to 0 is easy

writing from 0 to 1 involves an erase-block to reset many pages

New cards

Magnetic disks

To read: seek to right track (seek latency) → wait until requested data rotates in time (rotational latency) → transfer (transfer time)

Rotational latency depends on RPM (to find, find how many seconds per rotation and get average (multiply by .5))

New cards

Solutions to SSD failure

Mirroring → write on two SSDs

Mirror and striping → mirroring while reading from both at same time

RAID

New cards

RAID

Keep one disk that is related to the others

Ex: P1 = A1 XOR B1 XOR C1 XOR D1

If parity disk fails: recompute parities

If other fails: calculate value from parity

Problem: parity disk is used frequently

Solution: rotate which disk is parity disk

New cards

Grouping data

Row-store (favors inserting data) and column-store (favors few columns)

New cards

Record-identifier (RID)

A record can then be accessed via a pair of:

• Disk block address

• Offset into the pointer array

New cards

Buffer pool

The buffer pool is a RAM cache for hot data

Replacement policies: least recently used, least frequently used

New cards

B+ tree

Used to store (value, RID) pairs where the value comes

from the indexes attribute(s) and the RID is the record having that

value

Has n values and n+1 pointers down to lower trees

New cards

Example of B+ tree math: E.g., suppose that values and block pointers each take 8 bytes, and that our block size is 4096 bytes. How many levels in a 256-way search tree of 2^30 items?

Need 8n + 8(n+1) <= 4096

16n <= 4088

n <= 255

So a branching factor of up to 256 (i.e., n+1) is possible

log(base 256) 2^thirty = 4

New cards

Clustered vs unclustered B+ trees

Cost of B+ trees

Main file sorted or unsorted by index key

cost = (h + n)*(t_T + t_S)

where

h = tree height

n = #records fetched

t_T = transfer time

t_S = seek+rotation time

New cards

First group of hash query process operators and their cost (if it fits in RAM)

Eliminate duplicates: |R|

Equaliity-join: |R| + |S|

Aggregation: |R|

New cards

One layer of partitioning

If the necessary data cannot fit in RAM, then we can make a pass though the inputs to partition them into smaller pieces.

For duplicate elimination, partition-number = f(record)

For equality joins, partition-number = f(join-attribute)

For aggregationpartition-number = f(group-by attributes)

Cost of the partitioning phase is one scan of each input (to read)

plus one scan of each input (to write the partitions) (ex: multiply all times by 3)

New cards

Sort-merge

a) Sort each input by join-key

b) Merge them in join-key order

cost = (#records in R)*(index lookup cost) + |R|

New cards

Left-deep trees and pipelining

left-deep tree: left-most branch is deepest

Pipelining: In a query plan with many joins, you can pause processing of a particular join once it has generated a certain amount of output

store the intermediate output in an in-memory buffer

Then start processing the parent join to consume that output

pause the parent operator when its output fills a buffer, or when

the input is exhausted

Keep processing in this way, alternating between joins

New cards

Optimality principle to create most efficient joins

Step 1 → find most efficient plan for each table

Step 2 → find most efficient plan to join each table from step 1 with another table. Keep cheapest for each group of 2 tables

Step 3 → find most efficient plan to join each plan from step 2 with another table. Keep the cheapest for each group of 3 tables

New cards

Index-only plans

Sometimes queries can be answered using information in the index without even consulting the main table, such as by looking just at B+ leaves

New cards

Materialized views

Create Materialized View V As

Select ...

From ...

Where ...

Like a conventional view, but now the records in the view are explicitly stored in the database. Use a refresh function for incremental maintainence

New cards

Isolation

Partial transaction effects are not visible to other transactions

New cards

Conflict-equivalence (CONFLICTS)

Two operations conflict if operations operate on same data item and at least one of them is a write

New cards

Conflict-equivalence

Order of conflicting operations in 2 schedules is the same

New cards

Conflict serializable

Conflict equivalent to a serial schedule. A schedule is conflict serializable if the order of its conflicting operations is the same as the order of conflicting operations in some serial schedule.

New cards

Finding conflict serializability using dependency graph

nodes are transactions, edges for each pair of conflicts, cycle means not serializable

New cards

2PL

two phase locking means of concurrency control. Before reading an item, need to obtain a shared lock. Before writing an item, need to obtain an exclusive lock. Once a transaction is done with an item, it can release locks it no longer needs

Two-phase protocol:

• During an initial phase, locks are obtained but not released.

• During a subsequent phase, locks are released, and no new locks are requested.

New cards

2PL issues

deadlocks. Addressing deadlocks

1) Detect it. Maintain a graph of which transactions are waiting for each other. If a cycle occurs, abort some transaction in the cycle.

2) Time-out if a transaction takes "too long".

• Requires sophisticated tuning of timing thresholds

3) Prevent it by locking in a restricted way.

• Always lock items in a fixed order (e.g., lexicographic)

• Always obtain all locks at the start, and abort if that's not possible.

• Problem: unrealistic to know at the start of a transaction which items are going to be accessed

New cards

Timestamp-based protocol

Check if transaction’s timestamp is happening after the most recent write’s timestamp

New cards

Optimistic protocol

Instead of locking, just operate like no issues and see if there are issues later

New cards

Durability

Our buffer pool has two performance-oriented choices:

No force, i.e., we will update the buffer pool page without forcing the updates to persistent storage.

Steal, i.e., a buffer pool page could be evicted even when it contains uncommitted modifications.

No-force means that we might lose updates in case of failure.

Steal means we may have uncommitted updates on persistent storage after a failure, and those updates will need to be reversed.

New cards

Log data structure

Append-only file on persistent storage that records updates made by transactions

New cards

Recovering using Log

ARIES (Analysis, redo, undo)

Analysis identifies committed vs uncommitted transactions and dirty pages.

Redo reapplies all updates (committed and uncommitted) that might not be on disk, using the dirty page table.

Undo rolls back all uncommitted transactions using the transaction table and the log.

Explore top notes

Sonar

Updated 324d ago

Note

Chapter 1 - Introduction to Child Development

Updated 396d ago

Note

Mid-Term Exam Review

Updated 200d ago

Note

Chapter 21: Earth's Changing Surface

Updated 935d ago

Note

Psychological Test Report

Updated 1113d ago

Note

european expansion and exploration

Updated 977d ago

Note

FRQ 3 Novel Analysis Hamlet

Updated 1234d ago

Note

AP Environmental Science: Unit 8 Review - Aquatic & Terrestrial Pollution

Updated 1636d ago

Note

Sonar

Updated 324d ago

Note

Chapter 1 - Introduction to Child Development

Updated 396d ago

Note

Mid-Term Exam Review

Updated 200d ago

Note

Chapter 21: Earth's Changing Surface

Updated 935d ago

Note

Psychological Test Report

Updated 1113d ago

Note

european expansion and exploration

Updated 977d ago

Note

FRQ 3 Novel Analysis Hamlet

Updated 1234d ago

Note

AP Environmental Science: Unit 8 Review - Aquatic & Terrestrial Pollution

Updated 1636d ago

Note

Explore top flashcards

Theology 730 midterm

Updated 363d ago

Flashcards (85)

Prostate Cancer- Heemer

Updated 325d ago

Flashcards (32)

Business Flashcards

Updated 298d ago

Flashcards (173)

Spectroscopy & Spectrometry

Updated 1042d ago

Flashcards (51)

TUT3 R10 Active Listening

Flashcards (29)

Flashcards (124)

Flashcards (101)

APUSH - important dates quiz 1

Updated 5d ago

Flashcards (25)

Theology 730 midterm

Updated 363d ago

Flashcards (85)

Prostate Cancer- Heemer

Updated 325d ago

Flashcards (32)

Business Flashcards

Updated 298d ago

Flashcards (173)

Spectroscopy & Spectrometry

Updated 1042d ago

Flashcards (51)

TUT3 R10 Active Listening

Flashcards (29)

Flashcards (124)

Flashcards (101)

APUSH - important dates quiz 1

Updated 5d ago

Flashcards (25)