Files and Indexing

0.0(0)

Studied by 0 people

Knowt Play

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/73

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

74 Terms

New cards

CPU Registers

Small, extremely fast storage inside the CPU. Temporarily holds data or instructions currently being used by the CPU.

New cards

Cache memory

A small, high-speed memory located close to the CPU.

Stores frequently accessed data or instructions to reduce latency and improve performance

New cards

RAM (Main memory)

Volatile. Stores data for programs that are currently open

New cards

SSD

storage that uses flash memory to store data, with no moving parts. Uses flash storage.

New cards

Hard (magnetic) drive

storage device that uses spinning disks (platters) to store data magnetically. (photos, files)

New cards

Tape/ Optical Disk

very large, non-volatile, sequential-access storage used for archiving data or backups. (blu ray disks)

New cards

Primary Storage

Storage directly accessible by the CPU that holds temporary information for computations. (CPU registers, cache memory/RAM)

New cards

Secondary Storage

Storage that is remote to the CPU and permanently holds data, even when the PC is turned off. ( SSD, Hard drive)

New cards

Teritary Storage

High-capacity external storage used mainly for back ups (Tape, optical disk)

New cards

Volatile Storage

Storage which loses its contents when the power is removed

New cards

NonVolatile Storage

Storage that retains its data after the power is turned off

New cards

Annualized Failure Rate

Percentage of drives failing in a year. Estimated AFR for HDDs 1-3 years old is around 2%.

New cards

SSD Pros

no mechanical failures (non moving)
good read -consume less space

New cards

SSD cons

erase then write, writing is slower then reading
number of erasures limited, 0.1M-M
more expensive then HDD

New cards

Main sources of R/W delay

Seek Time (wait to move head to track)
Rotational Latency (time for sector to rotate to r/w head)
Transfer time (r/w data)

New cards

RAID

Redundant Array of Independent Disks (multiple hd or ssd into single unit)

New cards

RAID : Disk Mirroring

creating two copies of data on 2 or more drives

New cards

RAID : Disk Mirroring PROS

can survive failed drive since we have multiple -Maybe parallel reading

New cards

RAID : Disk Mirroring CONS

2 vs 1 drives, cost, power

New cards

RAID : Disk Stripping

Data divided into blocks written sequentially across multiple drives. Block #/# of drives

New cards

RAID : Disk Stripping PROS

parallel read and write

New cards

RAID : Disk Stripping CONS

increase chance of system failure
no data replication

New cards

RAID : Parity bits

uses extra bits to recreate lost data in a failed drive

(even, odd, mark, none, # of 1's)

New cards

RAID : Parity bits PROS

can rebuild failed drives

New cards

RAID : Parity bits CONS

need parity drive
cannot recover from 2 or more failures

New cards

Demorgans law for sets

not(A U B) = notA intersect notB

New cards

RAID Level 0

Striped Volume N data disks Data striping, no parity or mirroring. Cannot recover from failed drive. Good Write.

New cards

RAID Level 1

disk mirroring, opportunity for parallel reads, need 2x disks

New cards

RAID Level 5 Block Interleaved Distributed Parity

Combines stripping with parity, parity drives are across all drives, stripping is one block

New cards

RAID Level 6

RAID 5 add a second parity scheme, VS RAID 5 can recover from 2 stimutaneous drive failures, same read slower write

New cards

Blocking Factor (bf)

records per block, ⌊block size /record size⌋

New cards

Internal Fragmentation

unallocated memory within a block. block size - (records per block * record size)

New cards

Packed (Contiguous) Allocation

For fixed-length records. Records are stored one after the other. Store the quantity of records at the end

New cards

Unpacked Allocation

For fixed-length records. Records are stored anywhere and a bitmap

New cards

Variable-Length Record Allocation

similar to unpacked, except bitmap is a slot directory holding starting byte position and length

New cards

Primary Index

An index created on a chosen candidate key where both the index and the database are sorted on that key. Only 1 per file

New cards

Secondary Index

An index created on any field where only the index is sorted on that key. Can have as many as you want. Index points to linked list of pointers that each point to corresponding record.

New cards

Clustered Index

An index created on a chosen secondary key where both the index and the database are sorted on that key. Index points to either a record or blocks. Can have either a primary or a clustered index per file.

New cards

Secondary Index point to what

linked list of pointers that each point to corresponding record

New cards

Clustered Index point to what

records or blocks

New cards

Dense Indices

An index that contains an entry for every record

New cards

Sparse Indices

An index that doesn't contain an entry for every record. Each key points to a whole block. If searching for something between two keys, have to search the block. Performance of contains() suffers.

Only index small subset of field values

New cards

Dynamic Hashing

Tree directory
Leaves are blocks holding index entries
Insert keys and dynamically split blocks when needed to resize

New cards

Types of keys: Canidate Key

a key able to uniquely identify a record (UAID)

New cards

Types of keys: Primary Key

The chosen canidate key

New cards

Types of keys: Secondary Key

Any non Canidate key

New cards

Types of keys: Sort Key

A key used to order records in the data base

New cards

Classification of Indexes: Ordered

Single level, Multi-level (sorted file, b+ tree)

New cards

Classification of Indexes: Unordered

hashing (extendible/dynamic)

New cards

Dynamic hashing

tree directory
leaves are blocks holding entries
insert keys and dynamic resize

New cards

Extendible hashing

Directory is array

New cards

Extendible hashing advantage

only adjust directory size instead of restructuring and rehashing

New cards

Extendible hashing: Local Depth

each index block has it
bigger than 1
Looks at first bit

New cards

Extendible hashing: global depth

a directory has 1, looks at first bit

New cards

Extendible hashing: directory size

k^g

k is alphabet set number, binary is 2 g is global depth

New cards

Extendible hashing: when do buckets split?

buckets only split if l<g, else directory doubles

New cards

Extendible hashing: Deletion if you have disk space

delete index entry and keep blocks for future insertion

New cards

Extendible hashing: Deletion if you don't have disk space

-merge sibling blocks -shrink directory when merging if able

New cards

B-Tree:

a self-balancing search tree used in databases and file systems to organize and store data for efficient searching, insertion, and deletion.

New cards

B-Tree: Structure

A node has n keys and n+1 pointers -Dense -Ascending order

New cards

B-Tree of Order M (D.comer)

-Each node has at most 2M keys -at most 2M+1 Pointers

at least M keys per node
Non leaf has at least 2 children
balanced (leaf nodes same level)
a non leaf node with n keys has n+1 children

New cards

B-Tree of Order M: Insertion

Insert in order

if full (2M) split into 2 nodes with median as parent

New cards

B-Tree of Order M: Deletion if underfull node is leaf and Sibling has extra keys

move separating parent to node and move smallest or largest from neighbor to parent

New cards

B-Tree of Order M: Deletion if underfull node is leaf and sibling don't have extras

Merge with sibling by adding node, neighboring sibling and a parent

New cards

B-Tree of Order M: Deletion if underfull node is an internal node

replace deleted key with its inorder predecessor or successor
recurse if necessary

New cards

Whats inorder Predecessor

Largest key from left subtree

New cards

Whats inorder Sucessor

smallest key from right subtree

New cards

Capacity of a B-tree: How many keys in a B tree with M=128 and has 3 levels (height of 2)

Level 0= 256 keys (2M) Level 1= 257 children, each with 256 keys = 65,792 keys Level 2 = 257^2 children, each with 256 keys

New cards

Capacity of a B-tree: How many keys in a B tree with M=3 and has 3 levels (height of 2)

Level 0= 6 keys, 2M, 7 Pointers Level 1 = 7 children with 6 keys each = 42 keys Level 2 = 7^2 children with 6 keys each, 294 keys

New cards

How to determine B-tree order with block of 8k, 12 keys and 8 ptrs

2m Keys, 2M+1 pointers (2M)*12 + (2M+1)*8 = 8192

New cards

B+ Tree

A B+ Tree is a self-balancing search tree used in databases and file systems. It stores all keys in the leaf nodes and provides efficient sequential access with linked leaf nodes.

New cards

B+ Tree Vs B-Tree Structure

Each Key is stored in Leaf nodes
copies of some key values occupy internal nodes
leaf nodes are linked sequentially to form sequence set
Keep copy of parent and also has arrow

New cards

B+ Tree PROS

Built in pointers for keys in leaf nodes
support exact match and range

New cards

B+ Tree CONS

bit of wasted storage in upper levels
insert and delete more complex