There's ____ money invested in software than hardware.
more
12
New cards
Companies tend to focus efforts on new _____________ rather than changing software to run on different platforms.
functionality
13
New cards
______ _______ provide a new platform with no old software binaries needing support.
mobile devices
14
New cards
RISC
== MIPS. Simplifies hardware, designed for pipelining. Features a larger register set, fixed instruction word length, and register-register operations.
15
New cards
CISC
== x86. Provides higher code density. Features few general purpose registers, one source overlaps destination, variable-length instruction words, memory operands, and micro-coded in hardware.
16
New cards
In Project 1, you extracted _________ from instruction word in C/C++.
bitfields
17
New cards
MIPS integer ALU operation 00 == ___.
ADD
18
New cards
MIPS integer ALU operation 01 == ___.
SUB
19
New cards
MIPS integer ALU operation 10 == ___.
AND
20
New cards
MIPS integer ALU operation 11 == ___.
OR
21
New cards
What is Fast Multiply?
Use multiple adders (Cost/performance tradeoff), can be pipelined (several multiplications performed in parallel)
22
New cards
What is Fast Divide
Can't use parallel hardware (subtraction is conditional on sign of reminder), faster methods like SRT division generate multiple quotient bits per step (still requires multiple steps).
23
New cards
In floating point addition, there is a loss of _____________ when adding one big number to a bunch of smaller numbers.
associativity
24
New cards
Floating point addition takes longer than integer addition because it ___ ____ _____.
has more steps
25
New cards
4 steps of floating point addition
(1) Align binary points (2) Add significands (3) Normalize result & check for over/underflow (4) Round and renormalize if necessary
26
New cards
5 steps of floating point multiplication
(1) Add exponents (2) Multiply significands (3) Normalize result and check for over/underflow (4) Round and renormalize if necessary (5) Determine sign of result from signs of operands
27
New cards
Round to nearest
The default mode. Round to nearest representable value. If midway between two representable values, use the even (lowest-order bit) representable.
28
New cards
Round toward plus infinity, Round toward + infinity
All results rounded to smallest representable value greater than result.
29
New cards
Round toward minus infinity, Round toward - infinity
All results rounded to largest representable value less than result.
30
New cards
Round toward zero, Round toward 0
All results rounded to largest representable value whose magnitude is less than result (if negative, round up. If positive, round down).
31
New cards
________ _____ _____ matters. Associativity does not always hold.
Floating point order (or FP order)
32
New cards
SIMD instructions, vector instructions
Adds 4 x 128-bit registers, extended to 8 registers in AMD64/EM64T. Can be used for multiple FP operads: 2 x 64-bit double precision and 4 x 32-bit double precision. Instructions operate on them simultaneously; Single Instruction Multiple Data.
33
New cards
4 types of datapath elements
Instruction Memory, Register File, Arithmetic Logic Unit (ALU), Data Memory
34
New cards
R-type arithmetic
Read two register operands, perform arithmetic/logical operation, write register result.
35
New cards
Load-store operation
Read register operands, calculate address using 16-bit offset (use ALU, but sign-extend offset), Load: read memory and update register, Store: Write register value to memory
36
New cards
Branch instruction
Read register operands, compare operands (use ALU, subtract and check zero output), calculate target address (sign-extend displacement, shift left 2 places (word displacement), add to PC + 4 (already calculated by instruction fetch))
37
New cards
In Project 2, you were given an instruction, had to identify its datapath, and generate _______ _______ to execute instruction.
control signals
38
New cards
We don't use single-cycle implementation because it is ___ _________.
not efficient
39
New cards
MIPS Pipeline Datapath - Stage IF:
instruction fetch from memory
40
New cards
MIPS Pipeline Datapath - Stage ID
instruction decode, and register read
41
New cards
MIPS Pipeline Datapath - Stage EX, EXE
execute operation or calculate address
42
New cards
MIPS Pipeline Datapath - Stage MEM
access memory operand
43
New cards
MIPS Pipeline Datapath - Stage WB
write result back to register
44
New cards
Theoretical peak speedup formula
Time pipelined = Time nonpipelined/stages
45
New cards
Two reasons we don't achieve peak speedup are __________ ______ and _______.
unbalanced stages, hazards
46
New cards
What is a structural hazard?
Hardware can't support the combination of instructions we want on the same clock cycle
47
New cards
True or False: MIPS 5-stage pipeline has no structural hazard.
True
48
New cards
You can impose a structural hazard by using a single ______ instead of multiple.
memory
49
New cards
What is a data hazard?
When a pipeline stalls waiting for a step to complete to start the next.
50
New cards
To detect a data hazard, check for ____________ when going through instructions.
dependencies
51
New cards
You can fix a data hazard with ______.
bubble(s)
52
New cards
forwarding
resolving data hazards by retrieving the missing data element from internal buffers. AKA bypassing.
53
New cards
What is a control hazard?
Arises from need to decide based on the result of one instruction while others are executing.
54
New cards
Branches are resolved at the __ stage because branches received in IF just arrived in the same clock cycle.
ID
55
New cards
You can fix control hazards with _______.
flush(es)
56
New cards
branch prediction
Resolves control hazards by assuming the result of the branch.
57
New cards
In a branch prediction, it is assumed that branches ____ be taken. When does this make the program faster? When does this make it slower?
won't Faster if prediction is correct. Slow if prediction is wrong.
58
New cards
How can we always fix a hazard?
Stalling.
59
New cards
Load use hazards happen because...
it occurs later in the pipeline. Can't always avoid stalls by forwarding (value not computed when needed, can't forward back in time!)
60
New cards
What is a branch source hazard?
When you can't tell where the branch will go
61
New cards
In Project 3, you were given an instruction sequence and pipeline design, and counted the number of _______ required to correctly execute the sequence.
bubbles
62
New cards
Why do we use a 2-bit saturating counter operation over a 1-bit counter?
It is possible to wrongly predict a branch prediction, change, and then wrongly predict again. Instead, wait until the prediction is wrong twice before changing the prediction.
63
New cards
In Project 4, you modified the simulator to model a pipeline with full __________ _____.
forwarding paths
64
New cards
In Instruction-Level Parallelism, why do we use IPC instead of CPI?
CPI < 1, so use IPC instead.
65
New cards
What is in-order static multiple issue?
- Compiler groups instructions to be issued together - Packages them into "issue slots" - Compiler detects and avoid hazards
66
New cards
What are some disadvantages to compiler management?
- Not all stalls are predictable - Can't always schedule around branches b/c they are dynamically determined - Different ISAs mean different latencies and hazards
67
New cards
What is Dynamically-Scheduled In-Order Superscaling?
CPU executes instructions out of order avoiding stalls
68
New cards
What is the difference between Instruction-Level Parallelism and Thread-Level Parallelism?
ILP: execute multiple instructions in parallel TLP: execute multiple threads in parallel
69
New cards
Most instructions need inputs in the ____ stage. jr, beq, and bne need inputs in the __ stage. sw instruction's store data is needed in the ____ stage.
EXE1, ID, MEM1
70
New cards
ALU produces data at the end of ____ but it can't be forwarded until ____. Most instruction results are produced at the ____ stage. lw, mult, and div produce results at the ____ stage. jal's result becomes available at the __ stage.
EXE2, MEM1. EXE2, MEM2, ID.
71
New cards
In a memory hierarchy: higher levels have ______ speeds, _______ sizes, and ____ distance from the CPU. lower levels have ______ speeds, ______ sizes, and ____ distance from the CPU.
faster, smaller, less. slower, bigger, more.
72
New cards
What is latency?
Time to get data from first access
73
New cards
What is bandwidth?
Amount of data that can be transferred in a "stream".
74
New cards
What is DRAM? (4 items)
- Main memory - cells are small capacitors - must be refreshed to hold data - slow but less $
75
New cards
What is SRAM? (4 items)
- caches - cells composed of ~6 logic gates - multiple transistors - fast but more $
76
New cards
DRAM and SRAM are both ________ and ___-________ options, using _____ and ________ media.
volatile, non-volatile, flash, magnetic
77
New cards
Temporal Locality and 2 examples
Items accessed recently are likely to be accessed again soon Examples: loop instructions, induction variables
78
New cards
Spatial Locality and 2 examples
Items near those accessed recently are likely to be accessed soon Examples: sequential instructions, array data
79
New cards
Valid bit
Indicates if data is in a location. 1 if there is, 0 if not.
80
New cards
When is a valid bit dirty?
When the location was recently written to and not written out.
81
New cards
Tags
high-order address bits that tell what block is stored in a cache location
82
New cards
Direct Mapped Cache (3 items)
- Location determined by address - Direct mapped: only one choice (1 way) - (Block address) modulo (#Blocks in cache)
83
New cards
Fully Associative Cache (2 items)
- No index bits, a block can go in any entry - Usually increases hit rate
84
New cards
What is the shortcoming of a fully associative cache?
All entries must be searched instead of the one with the correct index, and comparator per entry is expensive.
85
New cards
n-way Set Associative Cache (4 items)
- Sets contain n entries - Block # determines set [(Block #) modulo (#Sets)] - Search all entries (ways) in a given set at once - n comparators (less expensive)
86
New cards
What happens on a store hit?
the CPU proceeds normally
87
New cards
What happens on a cache miss?
CPU pipeline stalls, block from next hierarchical level fetched. If instruction cache miss, restart instruction fetch. If data cache miss, complete data access.
88
New cards
Unlike a Write-Back cache, Write-Through updates the block in the cache AND the ______.
memory
89
New cards
Write-Back takes a _______ amount of time compared to Write-Through
shorter
90
New cards
Write-Through caches have access to a _____ ______, but it still slows down if it is full.
write buffer
91
New cards
What happens when a dirty block is replaced in a Write-Back cache?
It writes it back to memory, but it can use a write buffer to allow replaced block to be read first.
92
New cards
On a write-allocate store miss: What happens when write-allocate (allocate on miss) is used? What happens when write around (no write-allocate) is used?
It fetches the block. It doesn't fetch the block.
93
New cards
Least Recently Used (LRU) Replacement
Replace when hasn't been used the longest.
94
New cards
What do LRU and Random Replacement policies have in common.
Approx. the same performance
95
New cards
The L2 cache (4 items)
- larger - slower - services misses from primary cache - access when L1 is missed
96
New cards
The L1 cache (2 items)
- minimize hit time - split into I and D caches
97
New cards
Cache Design Trad-Offs: Increasing cache size __________ capacity misses and may ________ access time.
decreases, increase
98
New cards
Cache Design Trad-Offs: Increasing associativity __________ conflict misses and may ________ access time.
decreases, increase
99
New cards
Cache Design Trad-Offs: Increasing block size decreases __________ misses, __________ miss penalty, and for very large block sizes, increases ____ ____ due to pollution.