1/80
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
Immediate addressing
MOV R1, #10 loads the constant 10 into R1
Direct addressing
LOAD R2, 0x3000 loads the word stored at memory location 0x3000 into R2
Indirect addressing
LOAD R1, @0x200 if location 0x200 contains the pointer 0x500 @0x200 loads the value stored at 0x500
Register addressing
ADD R1, R2, R3 adds R2 and R3 and places the result in R1
Register-indirect addressing
LOAD R3, (R6) loads the value found at the memory address stored in R6
Displacement addressing
LOAD R4, 12(R14) computes the effective address as
BaseRegister + ConstantOffset. loads the value at address R14 + 12, often used to access a parameter relative to a frame pointer
Stack addressing
POP R4 PUSH R1
IF
Instruction Fetch
Fetch the instruction from instruction memory.
ID
Instruction Decode / Register Read
Decode instruction and read source registers
EX
Execute / Address Calculation
Perform ALU operations or compute memory address.
MEM
Memory Access
Access data memory for loads and stores.
WB
Write Back
Write results into the register file.
5 stage pipeline
IF, ID, EX, MEM, WB
data hazard
occurs when instructions need to use a value that has not yet been produced by a previous instruction, leading to a data dependency conflict
structural hazard
a conflict in computer architecture where two instructions attempt to use the same hardware resource at the same time, causing a stall in the pipeline
RAW (Read after write)
a data dependency where an instruction attempts to read a value from a register before a previous instruction has finished writing to
WAR (Write-After-Read hazard)
arises when an instruction attempts to write to a register before a preceding instruction has finished reading its value from that same register. In a strictly in-order, simple pipeline, this hazard is less likely
WAW (write-after-write)
occurs when two instructions write to the same register, but their execution order is different from the intended program order. This is a problem because if a later instruction writes to a register before an earlier instruction, the first instruction's result is overwritten
ALU-ALU Data Dependency
a specific type of Read After Write (RAW) data hazard in a pipelined processor, where one instruction's Arithmetic Logic Unit (ALU) operation depends on the result of the ALU operation of a preceding instruction
Control Hazard (Branch)
occurs in a pipelined processor when a branch instruction disrupts the normal sequential flow of execution, forcing the pipeline to wait to know the correct next instruction
Data Hazard (Load-Use)
pipeline hazard where an instruction immediately following a load instruction tries to use the data that was just loaded before the load operation has completed
Structural Hazard (Single Memory Port)
a type of pipeline hazard that occurs when two or more instructions simultaneously need to access the same memory port, which can only handle one request at a time
Mixed Hazards (Data + Control)
refers to the combination of data and control hazards, which are performance issues in a pipelined processor
What property must be preserved when reordering instructions
program semantics or the correct result/observable behavior of the single-threaded program. This is achieved by ensuring that all data dependencies and control dependencies between instructions are maintained
Data Fowarding
the process of sending data directly from one part of a system to another, bypassing intermediate steps to improve performance
Flush instructions
means to empty the pipeline of instructions that have been fetched but are no longer needed, which is a common response to mispredicted branches or data dependencies
Little endian
a byte order where the least significant byte of a multibyte data value is stored at the lowest memory address
Big endian
the most significant byte (the one with the highest value) is stored at the lowest memory address, making it the first in a sequence
principle of locality of reference
the observation that when a program accesses a memory location, it is likely to access that same location and nearby locations again in the near future
Temporal Locality (Locality in Time)
If a memory location is accessed, it is likely to be accessed again soon
Spatial Locality (Locality in Space)
If a memory location is accessed, it is likely that nearby memory locations will be accessed soon
Cache Hit
The data requested by the CPU is found in the cache. This is a fast and efficient outcome.
The CPU asks for the value at address 0x1000, and that address's data is already stored in a cache line. The cache provides the data immediately
Cache Miss
The data requested by the CPU is not found in the cache. The CPU must then wait for the data to be fetched from the slower main memory, which incurs a significant performance penalty
The program accesses a new variable for the very first time. The CPU looks for it in the cache, doesn't find it (a miss), and must load it from RAM into the cache
Compulsory Miss (Cold Miss)
Occurs when a piece of data is accessed for the very first time. It is impossible to avoid these misses because the data has never been in the cache before
Capacity Miss
Occurs when the cache is not large enough to hold all the data needed by the program
Conflict Miss
Occurs in set-associative or direct-mapped caches when multiple memory blocks map to the same cache set (or line)
What is the purpose of a tag in a cache line?
uniquely identify which specific block of main memory is currently stored in that cache line
L1 Cache
The smallest and fastest, built directly into the CPU core
L2 Cache
Larger and slower than L1, but still much faster than RAM
L3 Cache
The largest and slowest of the CPU caches (but still faster than RAM). It is shared among all the cores on a CPU chip
Unified Cache (Mixed Cache)
A single cache stores both instructions (the program code) and data (the variables the code operates on)
Split Cache (Harvard Architecture within the CPU)
There are two separate caches:
L1 Instruction Cache (I-cache): Only stores instructions.
L1 Data Cache (D-cache): Only stores data
Why do deeper pipelines require more aggressive caching strategies?
to break down instruction execution into more, smaller stages. This allows for higher clock speeds.
how direct-mapped caching works
the simplest cache structure. Each block of main memory can be placed in exactly one specific cache line
Direct-Mapped
Each memory block maps to exactly one cache line
Fully Associative
Any memory block can be placed in any cache line
Set-Associative
Each memory block maps to exactly one set, but can be placed in any line within that set
How does increasing associativity affect cache hit rate and hardware cost?
because it reduces conflict misses by giving a memory block more possible locations to reside in the cache and increases hardware cost and complexity
Number of Cache Lines
(Total Cache Size) / (Block Size)
What fields make up a memory address in a direct-mapped cache?
Tag, Index, Block offset
Index
The middle bits. Used to select which specific cache line to look in
Block Offset
The least significant bits. Used to find the specific byte within a cache block
In a 2-way set-associative cache, when both blocks in a set are full, what does the replacement policy do?
evict to make room for the new block
LRU (Least Recently Used)
Evicts the block that has not been accessed for the longest time
FIFO (First-In, First-Out)
Evicts the block that has been in the cache the longest, regardless of how recently it was used
random replacement
Evicts a randomly selected block from the set. It is very simple and cheap to implement in hardware
Write-Through caches
When the CPU writes to the cache, the data is immediately written to both the cache block and the main memory
write-back caches
When the CPU writes to the cache, the data is only written to the cache block. The main memory is updated only when this modified ("dirty") block is evicted from the cache
What problem does a write buffer solve in a write-through cache?
solves the performance problem of write-through caches
Average Memory Access Time (AMAT)
Hit Time + (Miss Rate × Miss Penalty)
Hit Time
Time to access the cache on a hit.
Miss Rate
The fraction of accesses that are misses (1 - Hit Rate)
Miss Penalty
The additional time required to fetch a block from the next level of the memory hierarchy on a miss
How can block size affect performance?
Can reduce compulsory misses because you prefetch useful data
and
Can increase the number of conflict and capacity misses
Cache Coherence
the problem and set of solutions that ensure all caches in a multiprocessor system (like a multicore CPU) have a consistent view of shared memory
Index Bits
log₂(Number of Lines)
Offset Bits
log₂(Block Size)
Tag Bits
Address Size - Index Bits - Offset Bits
Number of Sets
Number of Lines / Associativity
Effective Miss Penalty
L2 Hit Time + (L2 Miss Rate × L2 Miss Penalty)
why register addressing does not involve any memory access
operands are already located within the Central Processing Unit (CPU)
when is indirect addressing is preferable
when dealing with large datasets, dynamic memory, and pointers
When is Immediate addressing is more efficient than register-indirect addressing
when the value is a constant that is part of the instruction itself
why a load-use data hazard typically requires at least one stall
because a load instruction has to retrieve the data from memory
Why are WAR (Write After Read) and WAW (Write After Write) hazards normally impossible in a simple in-order 5-stage RISC pipeline
because instructions are executed in order in a RISC pipeline
1. LW R1, 0(R2)
2. ADD R3, R1, R4
Data Hazard(Load use) / RAW hazard
1. LW R1, 0(R2)
2. SW R3, 4(R2)
3. ADD R4, R5, R6
Which stages of which instructions will compete for memory access?
stages of 1 and 2; both need access to the same memory R
When is the branch outcome known in a 5-stage pipeline?
It’s known in the execution stage
branch delay slot scheduling
an instruction is placed in the instruction immediately following a branch
branch
an instruction that alters the sequential flow of execution