Note

0.0(0)

Take a practice test

Chat with Kai

undefined Flashcards

Explore Top Notes

History What we have studied so far

Studied by 7 people

Properties of Minerals and Rocks

Studied by 9 people

ISD4 The Dentine Pulp Complex

Studied by 5 people

NaOH Concentration Determination by Titration

Studied by 1 person

Chapter 1 - What is organizational behavior?

Studied by 24 people

All About the AP Psych Exam!

Studied by 61 people

CMPSC 311 – Caching & Memory Hierarchy

Definition & Purpose of Caches

Cache (general definition)
- Smaller, faster storage device that temporarily holds a subset of data from a larger, slower device
- Acts as a staging area between two hierarchy levels
Memory-hierarchy principle
- Level k (fast/small) serves as a cache for level (k+1) (slow/large)
- Because of locality, programs access level k data far more often than level (k+1)
- Net effect: system feels almost as fast as the top level while costing nearly as little as the bottom level

Processor Cache Levels

Modern CPUs implement multiple on-chip levels
- L1: extremely fast, very small, located adjacent to CPU core
- L2: slower than L1, but bigger
- L3: slower/bigger still; may be shared among cores or even off-chip
- Main memory (DRAM): slowest, cheapest per bit
Example (AMD Ryzen 5 5600X)
- 384\text{ KB} L1, 3\text{ MB} L2, 32\text{ MB} L3
- L1 ≈ 100\times faster than DRAM; L2 ≈ 25\times faster
Instruction vs Data caches: typically split at L1 for parallel fetch & execution paths

Visual Example of CPU Cache Sizes

Intel Core i5-3570K image illustrates distinct L1/L2/L3 regions and relative proportions (link given in slides)

Full Memory Hierarchy Recap

Ordered from smallest ⇢ largest (and fastest ⇢ slowest)
- L0: CPU registers
- L1: L1 cache (SRAM)
- L2: L2 cache (SRAM)
- L3: Main memory (DRAM)
- L4: Local secondary storage (SSDs/HDDs)
- L5: Remote secondary storage (network, tape, web servers)
Each level caches blocks/records from the next

Locality Principles

Spatial Locality
- Future accesses likely near recently accessed address
- Strategy: fetch/store data in blocks larger than requested byte/word
Temporal Locality
- Recently accessed data likely to be accessed again soon
- Strategy: retain recently used blocks longer in cache

General Cache Structure & Terminology

Cache line / block: minimum transfer unit between levels (fixed size)
Underlying memory imagined as an array of numbered blocks
Hit: requested block already in cache – served immediately
Miss: requested block absent – fetched from lower level (miss penalty), then placed in cache
Basic miss-handling actions (illustrated):
1. Detect miss
2. Read block from memory
3. Choose placement line (placement policy)
4. Possibly evict old block (replacement policy)

Placement (Mapping) Policies

Fully associative
- New block can go anywhere in cache
- Pros: eliminates conflict misses; Cons: expensive hardware (many comparators)
Direct-mapped
- Exactly one possible cache line for each memory block: line = (\text{block #} \bmod t), where t = #lines
- Simple/cheap but high conflict-miss risk
n-way set associative
- Cache divided into sets; each set holds n lines
- Block maps to one set, but any of n ways within that set
- Balances cost and flexibility

Types of Cache Misses

Cold / compulsory
- First reference to a block – cache initially empty
Capacity
- Working set size > cache size; block evicted for lack of space
Conflict (only for direct-mapped & set-associative)
- Two or more frequently used blocks map to same line or set → continuous thrashing
- Example sequence 0, 8, 0, 8, … with mapping (i \bmod 4) misses every time
- Slide illustration shows Level K/Level K+1 thrashing between blocks A & B

Replacement (Eviction) Policies

Least Recently Used (LRU)
- Evict block unused for longest time
Least Frequently Used (LFU)
- Evict block with lowest access count
First In First Out (FIFO)
- Round-robin order of arrival
Performance measured via
- Hit ratio = \frac{\text{hits}}{\text{total accesses}}
- Measured cost metrics (avg. latency, bandwidth)
- Efficiency highly workload-dependent ("working set")

Cache Performance Metrics

Hit ratio: fraction of memory references served from cache
Average memory access/latency time
\text{AMAT} = \text{hit time} + \text{miss ratio} \times \text{miss penalty}
where \text{miss ratio} = 1 - \text{hit ratio}
Numerical example (slide)
- \text{hit time} = 25\,\mu s
- \text{miss penalty} = 250\,\mu s
- \text{hit ratio} = 0.8 \Rightarrow miss\,ratio = 0.2
  \text{AMAT} = 25 + 0.2 \times 250 = 75\,\mu s

Worked Example: 4-Line LRU Cache

Memory access sequence (times 0-9): 1, 4, 3, 1, 5, 1, 4, 0, 3, 1
Using 4-line cache + LRU replacement
- Result: 6 misses, 4 hits → \text{miss ratio} = 0.6
- Assume \text{hit time} = 100\,\mu s, \text{penalty} = 1000\,\mu s
  \text{AMAT} = 100 + 0.6 \times 1000 = 700\,\mu s
Performance is poor due to high miss ratio caused by limited lines relative to working set

Application-Level Caching Strategies

Read-aside (Cache-aside)
1. Application checks cache → hit: return value
2. On miss: read from database/backing store
3. Manually write result into cache for future use
- Gives application full control; common in many libraries
Read-through
- Cache itself performs lookup to backing store on miss, stores result, returns to caller
- Application sees unified interface; simplifies code

Write Policies for Caches & Databases

Write-through
- Update cache and backing store simultaneously
- Simpler, safer; good when writes are relatively infrequent
Write-back (copy-back)
- Update only cache on write; mark line dirty
- Dirty line written to memory later (eviction or flush)
- Reduces write traffic but complicates coherence & failure recovery
Write-around
- Bypass cache; write goes directly to memory/store
- Used when written data unlikely to be read again soon (avoids polluting cache)

Ethical, Practical & Design Implications

Hardware cost vs. performance trade-off: more associative caches raise silicon area & power
Predictable real-time systems may sacrifice hit ratio for determinism (direct-mapped + known latency)
Database/web engineers must choose read/write strategies balancing freshness, throughput, and consistency
Multi-core systems add complexity (cache coherence), making write-back policies more intricate

Note

0.0(0)

Take a practice test

Chat with Kai

undefined Flashcards

Explore Top Notes

History What we have studied so far

Studied by 7 people

Properties of Minerals and Rocks

Studied by 9 people

ISD4 The Dentine Pulp Complex

Studied by 5 people

NaOH Concentration Determination by Titration

Studied by 1 person

Chapter 1 - What is organizational behavior?

Studied by 24 people

All About the AP Psych Exam!

Studied by 61 people