Looks like no one added any tags here yet for you.
Volatile Memory
Requires power, faster memory access eg. RAM
Non-volatile Memory(NVM)
Retains information without power, used for long term storage eg. ROM,Flash memory, SSD
2 types of RAM
SRAM and DRAM
SRAM
higher costing than DRAM, faster, and overall better, 4-6 transistors per bit
DRAM
Lower costing than SRAM, overall worse performance, 1 transistor per bit
What form of memory do cache’s use
SRAM(more efficient, more costly)
SDRAM
Synchronous DRAM uses a conventional clock signal, allows reuse of the same row addresses
DDR SDRAM
Double Data-Rate Synchronous DRAM uses double edge clocking which sends two bits per cycle per pin. Standard for most modern computer systems
Solution to CPU Memory Performance Gap
Memory Heirarchy
Locality: software or hardware solution?
Software
Locality
helps with CPU memory performance gap, programs tend to use data and instructions with addresses near or equal to those they have used recently
Memory Performance Gap
a problem where the CPU waits for memory to return data/instruction
Temporal Locality
Recently referenced items are likely to be referenced again in the near future
Spatial Locality
Items with nearby addresses tend to be referenced close together in time
What data type do registers hold
words
What data type do cache’s hold
cache lines
What data type do off chip memory formats hold
pages
Registers and Caches are
on chip
Main Memory, Local Storage, and Remote Storage(cloud) are
off chip
Benefit of 3rd Cache
reduces the miss penalty
words are how many bytes
4 or 8
cache lines are how many bytes
64
pages are how many bytes
4KBytes
Cache Memory
small, fast, SRAM
Cache memories and main memory are particitioned into
blocks called cache line or cache block of equal size
3 Types of Cache Misses
Cold(compulsory), Conflict, Capacity
Cold Cache Miss
occur because the cache is empty at the beginning of program execution
Conflict Cache Miss
Two or more memory locations map to the same cache set. As a result, it may find that the cache set is already occupied by the data from one of the conflicting addresses
Capacity Cache Miss
Occurs when the set of active cache blocks is larger than the cache(when program needs more cache blocks than can fit in the cache)
Direct Mapped Cache
1 cache line per set
E-way set Associative Cache
E cache lines per set
How to determine which cache line to access in an associative/E-way cache
compare tag bits
how to determine what data in a cache line needs to be accessed
offset and data type (short with offset of 0 is index 0 and 1)
Fully Associative Cache
All cache lines in a single set so there is no set index
Which cache is seperated into d-cache and i-cache
L1
d-cache
data cache (half of L1 cache)
i-cache
instruction cache(half of L1 cache)
Is Main memory on or off chip
off chip
What cache miss does a fully associative cache not have
Conflict Miss
The L3 Cache Is
a shared last level cache
L1 access time
4 cycles
L2 access time
11 cycles
L3 access time
30-40 cycles
L1 and L2 cache’s are both
8-way caches
L3 is what type of E-way cache
16-way
Which block is replaced when there are multiple victim candidates
Least Recently Used Block
advantages of splitting L1 cache
helps with locality and allows data and instructions to be sent at the same time
miss rate equation
1-(miss rate)
Hit Time
time it takes to deliver a line from cache to processor
Miss Penalty
the additional required time because of a miss
typical L1 hit time
1-2 clock cycles
typical L2 hit time
5-20 clock cycles
typical miss penalty
50-200 cycles for main memory
Average Memory Access Time Equation
AMAT=Hit time+(miss rate*miss penalty)
99% cache hits is twice as good as 97% T/F
True
3 ways to optimize cache
reduce miss rate, miss penalty, and hit time
advantages of increasing cache block size
reduces miss rate
disadvantages of increasing cache block size
increases miss penalty and conflict/capacity misses if cache is small
advantages of larger cache
reduces capacity misses
disadvantages of larger cache
longer hit time, higher cost and power
advantage of higher associativity
reduce conflict misses
advantage of multilevel caches
reduces miss penalty
cache stride pattern
the distance b/w consecutive accesses eg. Stride-1:A[0]→A[1]→A[2]→A[3]…
key idea of writing cache friendly code
our qualitative notion of locality is quantified through our understanding of cache memories
Writing cache friendly code:
90/10 rule, focus on inner loops of core functions(loop unrolling), minimize misses in inner loops, repeated references to data are good(temporal locality), Stride-1 reference patterns are good(spatial locality)
90/10 rule
90% of execution time is spent on the most costly 10% of the program
Benefits of Virtual Memory
Makes programming much easier, uses DRAM as a cache, simplifies memory management, isolates address spaces(easier memory protection)
Virtual Memory
an array of N contiguous bytes used while compiling programs. programs stored on disk.
T/F Disk is about 10,000x slower than DRAM
True
Enormous page fault penalty for data movement b/w where
main memory and disk
Page Table
an array of page table entries(PTEs) that maps virtual pages to physical pages
Page Hit
physical main memory has a page that CPU requests
Page Fault
Physical main memory does NOT have a page the CPU requests
T/F page fault causes an exception
True
In case of page fault what happens
victim is evicted, and offending instruction is restarted
Why does virtual memory work
Locality
Working Set
a set of active virtual pages that programs tend to access
if (working set size < main memory size)
good performance after compulsory misses
if(working set size > main memory size)
bad performance with capacity misses
worst case for virtual memory locality
Thrashing: performance meltdown where pages are swapped in and out continuously
Key idea of virtual memory
each process has its own virtual address space
T/F Mapping function scatters addresses through physical memory
True
Memory allocation
each virtual page can be mapped to any physical page
T/F a virtual page cannot be stored in different physical pages at different times
False
Mapping virtual pages to the same physical page allows for
multiple processes to access the same code
Steps of address translation for page hit
1: processor sends virtual address to MMU
2-3: MMU fetches PTE from page table in memory
4: MMU sends physical address to cache/memory
5: Cache/memory sends data word to processor
Steps of address translation for page fault
1: Processor sends virtual address to MMU
2-3: MMU fetches PTE from page table in memory
4: Valid bit is zero, so MMU triggers page fault exception
5: Handler identifies victim (and, if dirty, pages it out to disk)
6: Handler pages in new page and updates PTE in memory
7: Handler returns to original process, restarting faulting instruction
TLB
Translation Lookaside Buffer: small hardware cache in MMU that contains complete page table entries(PTEs) for small number of pages
purpose of TLB
speeds up translation by eliminating a memory access for the most used pages
consequence of TLB
if TLB miss, has to access TLB AND main memory
T/F TLB misses are very common
False
Programmer’s view of virtual memory
each process has its own private linear address space that cannot be corrupted by other processes
System view of virtual memory
uses memory efficiently by caching virtual memory pages(efficient because of locality), simplifies memory management and programming, simplifies protection by providing a convenient inter-positioning point to check permissions
Pipeline Speedup Equation
Pipelined Execution Time = Non-Pipelined Execution Time / number of stages
maximum speedup of pipeline
number of stages
3 Hazard types for pipeline
Structural Hazard, Data Hazard, Control Hazard
Structural Hazard
a required resource does not exist or is busy
Data Hazard
needs to wait for previous instruction to complete its data read/write
Control Hazard
Deciding on control-flow action depends on previous instruction
Forwarding(a.k.a. Bypassing)
Use result as soon as it’s computed to help with Data Hazard (requires extra hardware circuit connections)