AR

CMPSC 311 – Caching & Memory Hierarchy

Definition & Purpose of Caches

  • Cache (general definition)
    • Smaller, faster storage device that temporarily holds a subset of data from a larger, slower device
    • Acts as a staging area between two hierarchy levels
  • Memory-hierarchy principle
    • Level k (fast/small) serves as a cache for level (k+1) (slow/large)
    • Because of locality, programs access level k data far more often than level (k+1)
    • Net effect: system feels almost as fast as the top level while costing nearly as little as the bottom level

Processor Cache Levels

  • Modern CPUs implement multiple on-chip levels
    • L1: extremely fast, very small, located adjacent to CPU core
    • L2: slower than L1, but bigger
    • L3: slower/bigger still; may be shared among cores or even off-chip
    • Main memory (DRAM): slowest, cheapest per bit
  • Example (AMD Ryzen 5 5600X)
    • 384\text{ KB} L1, 3\text{ MB} L2, 32\text{ MB} L3
    • L1 ≈ 100\times faster than DRAM; L2 ≈ 25\times faster
  • Instruction vs Data caches: typically split at L1 for parallel fetch & execution paths

Visual Example of CPU Cache Sizes

  • Intel Core i5-3570K image illustrates distinct L1/L2/L3 regions and relative proportions (link given in slides)

Full Memory Hierarchy Recap

  • Ordered from smallest ⇢ largest (and fastest ⇢ slowest)
    • L0: CPU registers
    • L1: L1 cache (SRAM)
    • L2: L2 cache (SRAM)
    • L3: Main memory (DRAM)
    • L4: Local secondary storage (SSDs/HDDs)
    • L5: Remote secondary storage (network, tape, web servers)
  • Each level caches blocks/records from the next

Locality Principles

  • Spatial Locality
    • Future accesses likely near recently accessed address
    • Strategy: fetch/store data in blocks larger than requested byte/word
  • Temporal Locality
    • Recently accessed data likely to be accessed again soon
    • Strategy: retain recently used blocks longer in cache

General Cache Structure & Terminology

  • Cache line / block: minimum transfer unit between levels (fixed size)
  • Underlying memory imagined as an array of numbered blocks
  • Hit: requested block already in cache – served immediately
  • Miss: requested block absent – fetched from lower level (miss penalty), then placed in cache
  • Basic miss-handling actions (illustrated):
    1. Detect miss
    2. Read block from memory
    3. Choose placement line (placement policy)
    4. Possibly evict old block (replacement policy)

Placement (Mapping) Policies

  • Fully associative
    • New block can go anywhere in cache
    • Pros: eliminates conflict misses; Cons: expensive hardware (many comparators)
  • Direct-mapped
    • Exactly one possible cache line for each memory block: line = (\text{block #} \bmod t), where t = #lines
    • Simple/cheap but high conflict-miss risk
  • n-way set associative
    • Cache divided into sets; each set holds n lines
    • Block maps to one set, but any of n ways within that set
    • Balances cost and flexibility

Types of Cache Misses

  • Cold / compulsory
    • First reference to a block – cache initially empty
  • Capacity
    • Working set size > cache size; block evicted for lack of space
  • Conflict (only for direct-mapped & set-associative)
    • Two or more frequently used blocks map to same line or set → continuous thrashing
    • Example sequence 0, 8, 0, 8, … with mapping (i \bmod 4) misses every time
    • Slide illustration shows Level K/Level K+1 thrashing between blocks A & B

Replacement (Eviction) Policies

  • Least Recently Used (LRU)
    • Evict block unused for longest time
  • Least Frequently Used (LFU)
    • Evict block with lowest access count
  • First In First Out (FIFO)
    • Round-robin order of arrival
  • Performance measured via
    • Hit ratio = \frac{\text{hits}}{\text{total accesses}}
    • Measured cost metrics (avg. latency, bandwidth)
    • Efficiency highly workload-dependent ("working set")

Cache Performance Metrics

  • Hit ratio: fraction of memory references served from cache
  • Average memory access/latency time
    \text{AMAT} = \text{hit time} + \text{miss ratio} \times \text{miss penalty}
    where \text{miss ratio} = 1 - \text{hit ratio}
  • Numerical example (slide)
    • \text{hit time} = 25\,\mu s
    • \text{miss penalty} = 250\,\mu s
    • \text{hit ratio} = 0.8 \Rightarrow miss\,ratio = 0.2
      \text{AMAT} = 25 + 0.2 \times 250 = 75\,\mu s

Worked Example: 4-Line LRU Cache

  • Memory access sequence (times 0-9): 1, 4, 3, 1, 5, 1, 4, 0, 3, 1
  • Using 4-line cache + LRU replacement
    • Result: 6 misses, 4 hits → \text{miss ratio} = 0.6
    • Assume \text{hit time} = 100\,\mu s, \text{penalty} = 1000\,\mu s
      \text{AMAT} = 100 + 0.6 \times 1000 = 700\,\mu s
  • Performance is poor due to high miss ratio caused by limited lines relative to working set

Application-Level Caching Strategies

  • Read-aside (Cache-aside)
    1. Application checks cache → hit: return value
    2. On miss: read from database/backing store
    3. Manually write result into cache for future use
    • Gives application full control; common in many libraries
  • Read-through
    • Cache itself performs lookup to backing store on miss, stores result, returns to caller
    • Application sees unified interface; simplifies code

Write Policies for Caches & Databases

  • Write-through
    • Update cache and backing store simultaneously
    • Simpler, safer; good when writes are relatively infrequent
  • Write-back (copy-back)
    • Update only cache on write; mark line dirty
    • Dirty line written to memory later (eviction or flush)
    • Reduces write traffic but complicates coherence & failure recovery
  • Write-around
    • Bypass cache; write goes directly to memory/store
    • Used when written data unlikely to be read again soon (avoids polluting cache)

Ethical, Practical & Design Implications

  • Hardware cost vs. performance trade-off: more associative caches raise silicon area & power
  • Predictable real-time systems may sacrifice hit ratio for determinism (direct-mapped + known latency)
  • Database/web engineers must choose read/write strategies balancing freshness, throughput, and consistency
  • Multi-core systems add complexity (cache coherence), making write-back policies more intricate