Files and Indexing

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/73

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

74 Terms

1
New cards

CPU Registers

Small, extremely fast storage inside the CPU. Temporarily holds data or instructions currently being used by the CPU.

2
New cards

Cache memory

A small, high-speed memory located close to the CPU.

Stores frequently accessed data or instructions to reduce latency and improve performance

3
New cards

RAM (Main memory)

Volatile. Stores data for programs that are currently open

4
New cards

SSD

storage that uses flash memory to store data, with no moving parts. Uses flash storage.

5
New cards

Hard (magnetic) drive

storage device that uses spinning disks (platters) to store data magnetically. (photos, files)

6
New cards

Tape/ Optical Disk

very large, non-volatile, sequential-access storage used for archiving data or backups. (blu ray disks)

7
New cards

Primary Storage

Storage directly accessible by the CPU that holds temporary information for computations. (CPU registers, cache memory/RAM)

8
New cards

Secondary Storage

Storage that is remote to the CPU and permanently holds data, even when the PC is turned off. ( SSD, Hard drive)

9
New cards

Teritary Storage

High-capacity external storage used mainly for back ups (Tape, optical disk)

10
New cards

Volatile Storage

Storage which loses its contents when the power is removed

11
New cards

NonVolatile Storage

Storage that retains its data after the power is turned off

12
New cards

Annualized Failure Rate

Percentage of drives failing in a year. Estimated AFR for HDDs 1-3 years old is around 2%.

13
New cards

SSD Pros

  • no mechanical failures (non moving)

  • good read -consume less space

14
New cards

SSD cons

  • erase then write, writing is slower then reading

  • number of erasures limited, 0.1M-M

  • more expensive then HDD

15
New cards

Main sources of R/W delay

  1. Seek Time (wait to move head to track)

  2. Rotational Latency (time for sector to rotate to r/w head)

  3. Transfer time (r/w data)

16
New cards

RAID

Redundant Array of Independent Disks (multiple hd or ssd into single unit)

17
New cards

RAID : Disk Mirroring

creating two copies of data on 2 or more drives

18
New cards

RAID : Disk Mirroring PROS

  • can survive failed drive since we have multiple -Maybe parallel reading

19
New cards

RAID : Disk Mirroring CONS

2 vs 1 drives, cost, power

20
New cards

RAID : Disk Stripping

Data divided into blocks written sequentially across multiple drives. Block #/# of drives

21
New cards

RAID : Disk Stripping PROS

  • parallel read and write

22
New cards

RAID : Disk Stripping CONS

  • increase chance of system failure

  • no data replication

23
New cards

RAID : Parity bits

uses extra bits to recreate lost data in a failed drive

(even, odd, mark, none, # of 1's)

24
New cards

RAID : Parity bits PROS

  • can rebuild failed drives

25
New cards

RAID : Parity bits CONS

  • need parity drive

  • cannot recover from 2 or more failures

26
New cards

Demorgans law for sets

not(A U B) = notA intersect notB

27
New cards

RAID Level 0

Striped Volume N data disks Data striping, no parity or mirroring. Cannot recover from failed drive. Good Write.

28
New cards

RAID Level 1

disk mirroring, opportunity for parallel reads, need 2x disks

29
New cards

RAID Level 5 Block Interleaved Distributed Parity

Combines stripping with parity, parity drives are across all drives, stripping is one block

30
New cards

RAID Level 6

RAID 5 add a second parity scheme, VS RAID 5 can recover from 2 stimutaneous drive failures, same read slower write

31
New cards

Blocking Factor (bf)

records per block, ⌊block size /record size⌋

32
New cards

Internal Fragmentation

unallocated memory within a block. block size - (records per block * record size)

33
New cards

Packed (Contiguous) Allocation

For fixed-length records. Records are stored one after the other. Store the quantity of records at the end

34
New cards

Unpacked Allocation

For fixed-length records. Records are stored anywhere and a bitmap

35
New cards

Variable-Length Record Allocation

similar to unpacked, except bitmap is a slot directory holding starting byte position and length

36
New cards

Primary Index

An index created on a chosen candidate key where both the index and the database are sorted on that key. Only 1 per file

37
New cards

Secondary Index

An index created on any field where only the index is sorted on that key. Can have as many as you want. Index points to linked list of pointers that each point to corresponding record.

38
New cards

Clustered Index

An index created on a chosen secondary key where both the index and the database are sorted on that key. Index points to either a record or blocks. Can have either a primary or a clustered index per file.

39
New cards

Secondary Index point to what

linked list of pointers that each point to corresponding record

40
New cards

Clustered Index point to what

records or blocks

41
New cards

Dense Indices

An index that contains an entry for every record

42
New cards

Sparse Indices

An index that doesn't contain an entry for every record. Each key points to a whole block. If searching for something between two keys, have to search the block. Performance of contains() suffers.

Only index small subset of field values

43
New cards

Dynamic Hashing

  • Tree directory

  • Leaves are blocks holding index entries

  • Insert keys and dynamically split blocks when needed to resize

44
New cards

Types of keys: Canidate Key

a key able to uniquely identify a record (UAID)

45
New cards

Types of keys: Primary Key

The chosen canidate key

46
New cards

Types of keys: Secondary Key

Any non Canidate key

47
New cards

Types of keys: Sort Key

A key used to order records in the data base

48
New cards

Classification of Indexes: Ordered

Single level, Multi-level (sorted file, b+ tree)

49
New cards

Classification of Indexes: Unordered

hashing (extendible/dynamic)

50
New cards

Dynamic hashing

  • tree directory

  • leaves are blocks holding entries

  • insert keys and dynamic resize

51
New cards

Extendible hashing

  • Directory is array

52
New cards

Extendible hashing advantage

only adjust directory size instead of restructuring and rehashing

53
New cards

Extendible hashing: Local Depth

  • each index block has it

  • bigger than 1

  • Looks at first bit

54
New cards

Extendible hashing: global depth

  • a directory has 1, looks at first bit

55
New cards

Extendible hashing: directory size

k^g

k is alphabet set number, binary is 2 g is global depth

56
New cards

Extendible hashing: when do buckets split?

buckets only split if l<g, else directory doubles

57
New cards

Extendible hashing: Deletion if you have disk space

delete index entry and keep blocks for future insertion

58
New cards

Extendible hashing: Deletion if you don't have disk space

-merge sibling blocks -shrink directory when merging if able

59
New cards

B-Tree:

a self-balancing search tree used in databases and file systems to organize and store data for efficient searching, insertion, and deletion.

60
New cards

B-Tree: Structure

  • A node has n keys and n+1 pointers -Dense -Ascending order

61
New cards

B-Tree of Order M (D.comer)

-Each node has at most 2M keys -at most 2M+1 Pointers

  • at least M keys per node

  • Non leaf has at least 2 children

  • balanced (leaf nodes same level)

  • a non leaf node with n keys has n+1 children

62
New cards

B-Tree of Order M: Insertion

Insert in order

  • if full (2M) split into 2 nodes with median as parent

63
New cards

B-Tree of Order M: Deletion if underfull node is leaf and Sibling has extra keys

  • move separating parent to node and move smallest or largest from neighbor to parent

64
New cards

B-Tree of Order M: Deletion if underfull node is leaf and sibling don't have extras

  • Merge with sibling by adding node, neighboring sibling and a parent

65
New cards

B-Tree of Order M: Deletion if underfull node is an internal node

  • replace deleted key with its inorder predecessor or successor

  • recurse if necessary

66
New cards

Whats inorder Predecessor

Largest key from left subtree

67
New cards

Whats inorder Sucessor

smallest key from right subtree

68
New cards

Capacity of a B-tree: How many keys in a B tree with M=128 and has 3 levels (height of 2)

Level 0= 256 keys (2M) Level 1= 257 children, each with 256 keys = 65,792 keys Level 2 = 257^2 children, each with 256 keys

69
New cards

Capacity of a B-tree: How many keys in a B tree with M=3 and has 3 levels (height of 2)

Level 0= 6 keys, 2M, 7 Pointers Level 1 = 7 children with 6 keys each = 42 keys Level 2 = 7^2 children with 6 keys each, 294 keys

70
New cards

How to determine B-tree order with block of 8k, 12 keys and 8 ptrs

2m Keys, 2M+1 pointers (2M)*12 + (2M+1)*8 = 8192

71
New cards

B+ Tree

A B+ Tree is a self-balancing search tree used in databases and file systems. It stores all keys in the leaf nodes and provides efficient sequential access with linked leaf nodes.

72
New cards

B+ Tree Vs B-Tree Structure

  • Each Key is stored in Leaf nodes

  • copies of some key values occupy internal nodes

  • leaf nodes are linked sequentially to form sequence set

  • Keep copy of parent and also has arrow

73
New cards

B+ Tree PROS

  • Built in pointers for keys in leaf nodes

  • support exact match and range

74
New cards

B+ Tree CONS

  • bit of wasted storage in upper levels

  • insert and delete more complex