Pipelining and Superscalar | Quizlet

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/58

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

59 Terms

1
New cards

Latency

The time it takes for one instruction to finish.

2
New cards

Throughput

The number of instructions completed per unit time.

3
New cards

Increasing CPU performance methods

Increase clock frequency, add multiple cores, use pipelining, or add dedicated circuitry.

4
New cards

Problem with increasing clock frequency

Power wall — higher frequency leads to more power consumption and heat.

5
New cards

Problem with multiple cores

Requires parallel programming to be effective.

6
New cards

Problem with pipelining

Limited throughput improvement due to dependencies.

7
New cards

Problem with dedicated circuitry

Only improves one specific operation (one-trick pony).

8
New cards

Pipelined execution

Technique that overlaps instruction execution to increase throughput.

9
New cards

4-stage pipeline stages

Fetch → Decode → Execute → Write result.

10
New cards

Interstage buffers

Buffers that separate pipeline stages to allow independent operation (e.g., RA, RB, RM, RY, RZ).

11
New cards

B1 Fetch-decode buffer

Holds instruction.

12
New cards

B2 Decode-execute buffer

Holds source operands, operation, destination operand.

13
New cards

B3 Execute-write buffer

Holds result and destination operand.

14
New cards

Pipeline logic

Transfers data, operand specifiers, and control signals between stages.

15
New cards

Pipeline hazards

Events that cause stalls in the pipeline.

16
New cards

Common pipeline hazards

Cache miss, data dependency, branching, long operations (e.g., division).

17
New cards

Data dependency

When one instruction depends on the result of a previous one.

18
New cards

Pipeline stall due to dependency

CPU delays the dependent instruction until the required data is available.

19
New cards

Data forwarding

Sends ALU result directly to next instruction input, bypassing register file to reduce stalls.

20
New cards

Branching

Alters the normal sequence of instruction execution.

21
New cards

Unconditional branching

Always jumps to a new program location (e.g., jmp .loop).

22
New cards

Unconditional branch penalty

1 cycle (target computed in decode stage).

23
New cards

Conditional branching

Jumps to a new location only if a condition is true.

24
New cards

Conditional branch penalty

2 cycles (condition computed in execute stage).

25
New cards

Reducing branch penalty

Move comparator to decode stage (1 cycle penalty).

26
New cards

Branch prediction

Technique to predict the outcome of branch instructions to minimize pipeline stalls.

27
New cards

Static branch prediction

Based on fixed rules; assumes backward branches are taken, forward branches are not.

28
New cards

Static prediction accuracy

Around 80%.

29
New cards

Dynamic branch prediction

Uses previous execution outcomes to predict next branch.

30
New cards

Dynamic prediction states

LT (likely taken), NLT (not likely taken).

31
New cards

Branch target buffer (BTB)

Hardware table storing addresses and outcomes of recent branch instructions.

32
New cards

Purpose of BTB

Identifies branches and predicts target addresses for faster fetching.

33
New cards

Pipeline performance formula variables

N = number of instructions, S = cycles per instruction, R = clock rate.

34
New cards

Branch misprediction penalty formula

Sbranch = Branch_proportion + (1 - accuracy) * penalty.

35
New cards

Cache miss penalty formula

Smiss = (Fetch_miss + Load_store Data_miss) penalty.

36
New cards

Total cycles per instruction

S = 1 + Smiss + Sbranch.

37
New cards

Effect of n-stage pipeline

Increases throughput by factor of n, but too many stages cause more stalls.

38
New cards

Superscalar processor

CPU with multiple parallel execution units, each possibly pipelined.

39
New cards

Goal of superscalar architecture

Increase instruction throughput via parallelism.

40
New cards

Superscalar operation steps

Fetch multiple instructions → Queue them → Dispatch to multiple execution units.

41
New cards

Types of execution units

Arithmetic unit, load/store unit, write unit.

42
New cards

Execution types

Parallel execution, in-order issue, out-of-order execution.

43
New cards

Program-order completion

In-order issue and in-order completion.

44
New cards

Speculative execution

Predicts results of operations to continue execution before confirmation; rollback if wrong.

45
New cards

Speculative execution use

Works with branch prediction to minimize stalls.

46
New cards

Runahead execution

Executes instructions ahead of time while waiting for data dependencies to resolve.

47
New cards

Register renaming

Eliminates false data dependencies by assigning unique registers.

48
New cards

Symmetric Multithreading (SMT) / Hyperthreading

Allows multiple hardware threads per core to increase utilization.

49
New cards

Processor structure

Contains multiple cores (1-192), each core may support multiple hardware threads (1-4).

50
New cards

Parallelization

Increases throughput and reduces program runtime by running tasks simultaneously.

51
New cards

Amdahl's Law formula

Ps = Ts * (Fs + Fp / p), where Fs = sequential portion, Fp = parallel portion, p = processors.

52
New cards

Speedup formula

S = Ts / Ps.

53
New cards

Flynn's taxonomy

Classification of computer architectures by instruction and data streams.

54
New cards

SISD

Single Instruction, Single Data — conventional sequential processor.

55
New cards

SIMD

Single Instruction, Multiple Data — same instruction operates on multiple data streams.

56
New cards

MIMD

Multiple Instruction, Multiple Data — multiple instruction and data streams.

57
New cards

MISD

Multiple Instruction, Single Data — rarely used architecture.

58
New cards

Unified memory architecture

All processors share a single memory space.

59
New cards

Distributed memory architecture

Each processor has its own private memory.