Memory Hierarchy

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/40

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

41 Terms

1
New cards

Why have a hierarchy?

Main memory is slow af, but on chip memory is expensive, so having a smaller on chip memory is a reasonable trade off. Even if not everything fits on these, we can do quite well at putting the right things on it using locality

2
New cards

Two types of locality

Spacial locality - programs use data it regions, so related data is usually clumped together (e.g. arrays)


Temporal locality - if you’ve used data recently, you’re likely to use it again

3
New cards

Layers of cache

L1, L2, L3

4
New cards

L1 cache

one per core, 10s of KB, 1-4 cycles to access

5
New cards

L2 cache

one per core, 100s of KB, 8-12 cycles to access

6
New cards

L3 cache

shared between cores, 10s of MB, 20-30 cycles to access

7
New cards

Main Memory

10s of GB, 200-400 cycles to access

8
New cards

Cache properties

Organised into a cache lines, each of which stores multiple words

Each line is commonly 32 or 64 bytes long

Allocation policy is how memory gets put into cache

Replacement policy is how we decide what gets kicked out

9
New cards

Cache line extra info

Cache lines include - tag to indicate where in memory it came from, valid bit to indicate if it holds data, dirty bit to indicate if it’s been written to

10
New cards

Direct Mapped Cache

Allocation - each line in memory maps to a specific location in cache (simple mod on addresses usually)

Advantages

  • simple

  • little hardware

  • fast, small, low power

  • easy to understand

Disadvantages

  • suffers from lots of collisions

  • unnecessary data eviction (cash thrashing)

  • highest miss rate

  • behaviour can be difficult to understand

11
New cards

Fully associative cache

Allocation - each line in memory can map to anywhere in cache

Advantages

  • most efficient use of space

  • relatively easy to understand

Disadvantages

  • lots of hardware needed

  • Need a CAM or similar

  • largest performance overhead, hard to make it fast

  • evictions become difficult, lots of options

12
New cards

Set associative cache

line at address A can only map to one set, but could map to any of the S lines in that set (again using a A mod (N/s))

Advantages:

  • best trade-off

  • performs well, not too hard to implement, good efficiency

Disadvantages

  • harder to understand

13
New cards

Current convention

4, 8, 16 way set associativity

14
New cards

Types of cache misses

Compulsory, Capacity, Conflict, (coherence)

15
New cards

Compulsory

Line has not been brought into cache before, would be misses even in an infinite cache

16
New cards

Capacity

Cache is lot large enough s.t. some lines are discarded and later retrieved - happen in a fully associative cache

17
New cards

Conflict

Set associative or direct mapped cache, lines are discarded but later needed because too many things went into the set. Also called collision misses, or interference misses. Could happen in any N-way set associative cache

18
New cards

Coherence miss

happens in multi-core systems when one core makes an item in another core’s cache stale

19
New cards

Cache trends as size +

Compulsory caches become insignificant, and capacity misses shrink

20
New cards

2-1 rule

miss-rate for a 1-way set associative cache of size X ~= miss rate for 2-way set associative cache of size X/2

21
New cards

Replacement policies

least recently used, least recently replaced, random

22
New cards

LRU

if it’s not been used in a while, then get rid of it

23
New cards

Least recently replaced

oldset in the set, could still be in use, but is simpler to implement

24
New cards

Random

simplest, fairly effective, ideally pseudo-random

25
New cards

Cache consistency

What happens when data is modified in a lower cache, how does that change get back to main memory?

Two main approaches are write-through and write-back

26
New cards

Write-through

write is passed on to next level when it happens (and will then propogate)

27
New cards

Write-back

Data is only update in the next level when that line is evicted

28
New cards

Inclusive vs exclusive cache

Inclusive, L3 contains all of L1 and L2, for example.

Exclusive, it doesn’t

29
New cards

Advantages of inclusive

Benefits for spatial and temporal coherence. If it was in L1 but isn’t any more, it’s likely to be in L2 or L3.

Easy to check if a core has a copy of data, just need to check it’s highest level of cache (L2).

30
New cards

Disadvantages of inclusive

Duplication of data

reduced unique capacity in n+1

expensive to maintain for shared caches with lots of cores

31
New cards

Trends in this?

Trending towards exclusive, especially as cores increase. If you have lots of different L2, L3 becomes mostly just L2 stuff

32
New cards

Cache coherency

If multiple cores share an array, where does that live?

Whichever core most recently modified the data has the up to date copy

When other cores want to access this they need to discover the most up-to-date version

33
New cards

Two classes of coherence protocols

Directory based, snooping

34
New cards

Directory Based

Sharing status for a block is kept in one place, the directory.

Can be SMP - one central directory, associated with e.g. main memory, or L3 cache (for single chip multi-core)

For a multi-chip thing we have a distributed directory, which is more complicated

35
New cards

Snooping

Caches handle coherence individually. They track sharing status of the blocks they have. Memory requests are broadcast on a shared bus, and controllers can snoop these requests to know when their data is updated for example.

36
New cards

BC4 Cache

35 MB L3, 256KB L2 per core, 32 KB data per core, 32KB instructions per core.

8-way set associative, write-back

Distributed around the CPU in a ring, some close to each core

37
New cards

Cache trends

On-chip has grown!

38
New cards

Cache optimisations

Hardware prefetch, Hit under miss (while waiting for a miss, do something else), Critical word first (when loading a line, bring the one we care about first), Merging write buffers (multiple updates happen together), Compiler optimisations make better use of caches too

39
New cards

Critical Word first

request missed word from memory, send it on as soon as it arrives, then load the rest of the line

40
New cards

Early restart

request the line as normal, but as soon as the requested data arrives send it on and allow execution to continue

41
New cards

Hardware prefetching

Maybe when you fetch a line, also fetch the next line, especially if there’s special ISA juice telling you too