Pipelining-2

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/71

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

72 Terms

1
New cards

What is Instruction Pipelining?

Pipelining is a performance enhancement technique that processes multiple instructions simultaneously by overlapping their execution stages, aiming to maximize hardware utilization and increase CPU throughput.

2
New cards

Why is pipelining necessary in computer architecture?

It is needed to minimize execution time and save clock cycles by ensuring hardware components, like the ALU and Decode unit, are not idle but are continually processing different instruction stages.

3
New cards

What is the core working principle and critical constraint of instruction pipelining?

The principle is to start the next instruction as soon as the previous one moves to the next stage. The critical constraint is that two instructions cannot occupy the same phase at the same time.

4
New cards

What are the four typical stages of an instruction pipeline?

Fetch (F): Retrieve the instruction from memory. Decode (D): Interpret the instruction and fetch operands. Execute (E): Perform the specified operation (e.g., addition). Write Back (W): Store the result back into a register.

5
New cards

What happens during the Instruction Fetch (F) stage and what hardware is used?

This stage retrieves the instruction from memory. Hardware Used: Program Counter (PC): Provides the address of the instruction. Memory Address Register (MAR): Receives the address to access memory. Memory: The location where the instruction is stored. Instruction Register (IR): The fetched instruction is placed here.

6
New cards

What happens during the Instruction Decode (D) stage?

This stage interprets what the instruction needs to do. Action: The Control Unit "strips off" the instruction bits from the IR to identify the opcode (the operation) and the source/destination registers. Result: It generates the control signals to activate the correct hardware for the later stages and may also fetch the operands (e.g., the values from R2 and R3).

7
New cards

What happens during the Execute (E) stage?

This stage performs the actual operation. Hardware Used: The ALU (Arithmetic Logic Unit) is the primary component. Action: For an instruction like ADD R1, R2, R3, the ALU would take the contents of R2 and R3 and perform the addition.

8
New cards

What happens during the Write Back (W) stage?

This stage stores the final result. Action: The result from the ALU (or from a memory load) is written back into the destination register (e.g., R1) in the Register File.

9
New cards

What are interstage buffers (latches) and why are they essential for pipelining?

Interstage buffers (represented as V1, V2, V3) are temporary storage hardware components placed between the main pipeline stages (e.g., V1 is between Fetch and Decode). Necessity: They are essential because they hold the results of one stage (e.g., the fetched instruction) and pass it to the next stage. This allows the first stage's hardware (e.g., the Fetch unit) to immediately flush its contents into the buffer and become free to start processing the next instruction in the very next clock cycle.

10
New cards

Explain why a two

stage (Fetch and Execute) pipeline is inefficient.

11
New cards

Why is it difficult to have a pipeline with more than four stages?

As you divide the cycle into more and more stages, the Control Unit design becomes extremely complicated. The CU must be written to generate and manage a much higher number of control signals to activate and control all the additional, smaller components, which makes the system difficult to manage.

12
New cards

What is a pipeline hazard?

A pipeline hazard is any condition or conflict that forces the system to pause or delay, preventing it from executing the next instruction when it is supposed to.

13
New cards

Define pipeline stall (or pipeline bubble).

A stall or bubble is the wasted time (a delay) that occurs when a pipeline has to wait due to a hazard. During that clock cycle, no useful work is being done in that part of the pipeline.

14
New cards

What are the three main types of pipeline hazards?

Data Hazard, Instruction Hazard, Structural Hazard.

15
New cards

What is a Data Hazard? Explain with an example.

A data hazard occurs when an instruction is dependent on the data (result) of a previous instruction that has not yet finished executing. Example: 1. I1: C = A + B 2. I2: X = 4 * C. The Conflict: I2 needs the value of 'C' for its operation. In a pipeline, I2 will reach its Decode/Operand Fetch stage (e.g., at Clock 3) and try to read 'C'. However, I1 will only calculate 'C' in its Execute stage (Clock 3) and make it available in its Write Back stage (Clock 4). Because 'C' is not available when I2 needs it, the pipeline must stall.

16
New cards

What are three methods used to mitigate (reduce) pipeline hazards?

Rearranging the Code (Code Reordering): Moving independent instructions into the delay slots. Inserting NOP (No Operation) Instructions: Filling the delay slots with "do nothing" instructions to force a wait. Operand Forwarding: A hardware technique that passes results back to earlier stages (though the notes caution it may not solve all data hazards).

17
New cards

Explain Code Reordering.

If the code has other instructions that are completely independent of the ones causing the hazard, the programmer or compiler can move these independent instructions into the stall period. This keeps the pipeline doing useful work while it waits for the dependent data to become available.

18
New cards

What is a NOP instruction and when is it used?

A NOP (No Operation) is a "do nothing" instruction (e.g., represented by all zero bits). It is used to fill a delay slot when code reordering is not possible (e.g., no independent instructions are available). It forces the hardware to wait for one or more clock cycles, allowing the data hazard to resolve itself.

19
New cards

To write efficient code for a pipelined system, what programming practice should be avoided?

Programmers should avoid breaking a single, long expression into many small, sequential, dependent statements. Impact: A greater number of dependent instructions makes pipelining very difficult and increases the likelihood of data hazards and stalls. Result: An inefficiently written program (with many dependencies) will still run correctly, but its performance will be poor, defeating the entire purpose of having a pipelined architecture.

20
New cards

What is the primary limitation of a single bus organization?

The primary limitation is that the bus is a shared resource that acts like a "one

21
New cards

Explain why a simple memory write operation is inefficient on a single bus.

A memory write requires sending data, an address, and a control signal. On a single bus, this must be done in three separate steps, consuming three clock cycles: 1. Clock 1 (Data Transfer): The data (e.g., from register R2) is put on the bus to go into the Memory Buffer Register (MBR). 2. Clock 2 (Address Transfer): The address (e.g., from register R1) is put on the bus to go into the Memory Address Register (MAR). 3. Clock 3 (Signal Issuance): The control signal (e.g., "write") is issued using the bus. The actual memory operation only begins after these three clock cycles are complete.

22
New cards

What are the three types of dedicated buses in a multiple bus organization?

Address Bus: Carries address signals. Data Bus: Used to fetch and transfer data (e.g., between MBR and registers). Control Bus: Passes control signals.

23
New cards

What is the main advantage of a multiple bus organization, and what is its main drawback?

Advantage: The three tasks (data, address, and signal) can be executed in one single clock cycle because there are "three roads" available. This allows more instructions to be processed per clock cycle. Drawback: The main drawback is increased complexity. Generating and maintaining control signals for all the various buses simultaneously is much more complicated than managing a single bus.

24
New cards

What is a limitation that still exists in a multiple bus system?

You cannot schedule two instructions that are dependent on the same resource. For example, if two different instructions both require memory access, they cannot happen at the same time because you can still only send one address at a time on the single Address Bus.

25
New cards

What are the two main types of Control Unit (CU) design?

Hardwired Control Unit (HCU) and Microprogrammed Control Unit (MCU).

26
New cards

What is a Hardwired Control Unit (HCU)?

An HCU is a control unit built using physical hardware components like logic gates, decoders, and multiplexers. Its logic is fixed in the hardware itself.

27
New cards

Explain the main limitation and fixed nature of an HCU.

An HCU is fixed because its logic is physically built in. It can only handle a limited instruction set, and the number of control signals it can output is predetermined. If a change is needed (like supporting more instructions or registers), the physical hardware design itself must be changed.

28
New cards

List the key components involved in the operation of a Hardwired Control Unit.

Instruction Register (IR): Provides the instruction (e.g., 32 bits) as input to the CU. Instruction Decoders: The instruction is "stripped" into parts (like the opcode). These parts go to decoders (e.g., a 3

29
New cards

What is a Microprogrammed Control Unit (MCU)?

An MCU is a control unit that relies more on software implementation. It involves writing complex code for instruction handling and decoding, which is stored in a special control memory.

30
New cards

Compare the flexibility of an HCU vs. an MCU.

HCU: Is inflexible. To add registers, you must physically change the hardware. MCU: Is flexible. To add registers (e.g., go from 32 to 64), you can make the change in the software (micro

31
New cards

What is the main cost or drawback of using a flexible Microprogrammed CU?

This high degree of flexibility comes at the cost of a complex design for the Control Unit itself.

32
New cards

What is an Instruction Set?

The Instruction Set (or ISA) is the set of all operations (like multiply, add, sum) that the system's hardware is designed to support.

33
New cards

What do RISC and CISC stand for?

RISC: Reduced Instruction Set Computer. CISC: Complex Instruction Set Computer.

34
New cards

What is the most critical difference between RISC and CISC regarding instruction size?

RISC: Uses a fixed instruction size. Every instruction is the same length (e.g., 32 bits), even if some fields are left empty. CISC: Uses a variable instruction size. Instructions can be different lengths (e.g., one might be 12 bits, another 64 bits), which can be more efficient in some cases.

35
New cards

Compare RISC and CISC on key features (Instruction Count, CU Type, Flexibility).

RISC: Instruction Count: Handles very few instructions. CU Type: Typically uses a Hardwired Control Unit. Flexibility: Restricted and fixed. CISC: Instruction Count: Allows for adding more instructions. CU Type: Typically uses a Microprogrammed Control Unit. Flexibility: High flexibility fields and components can be scaled.

36
New cards

Is CISC inherently better or more efficient than RISC?

No. You can't state that one is inherently better. Increasing a component (like the number of registers in a CISC machine) does not guarantee an increase in the overall system's efficiency or performance, as other factors may create bottlenecks.

37
New cards

What is the difference between Infix and Postfix notation?

Infix: The operator is placed between the operands (e.g., X * Y). Postfix: The operator is fixed after the operands (e.g., XY*).

38
New cards

What is the general method for converting an Infix expression to Postfix?

The conversion process requires considering the highest priority operators first and evaluating the expression from left to right. Operations within parentheses must be solved first, following the rules of operator precedence.

39
New cards

What is a foundational assumption made in pipeline timing diagrams that is often unrealistic?

The foundational assumption is that each instruction stage takes exactly one clock cycle. This is often untrue in reality, especially for operations involving memory access (like fetching from main memory), which can take many clock cycles.

40
New cards

According to the notes, what two things should you always include in an exam answer when explaining a pipeline hazard?

You should always include an example (like a sequence of instructions) and draw a timing diagram to best illustrate the stall and the conflict.

41
New cards

What are the three primary solutions for Data Hazards discussed in the notes?

Reordering Code (Instruction Reordering), Software Handling (NOP Insertion), Operand Forwarding (Hardware

42
New cards

Explain Reordering Code as a solution for data hazards. What is its main limitation?

This solution involves restructuring the program's code to move independent instructions between the two dependent instructions. This creates a natural delay, giving the first instruction enough time to complete and make its data available before the second instruction needs it. Limitation: This is only possible if there are independent instructions available in the code that can be safely moved without changing the program's logic.

43
New cards

What is NOP Insertion, and when is it used?

Definition: NOP (Not an Operation) is a "filler" instruction that does nothing except consume a clock cycle. Usage: This is a software

44
New cards

What is Operand Forwarding, and is it a hardware or software solution?

Definition: Operand Forwarding (also known as Bypassing) is a hardware

45
New cards

Define Instruction Hazard.

An Instruction Hazard occurs when the instruction itself is not available for the pipeline in the next clock cycle when it is needed. This is typically caused by: 1. Control Flow Changes (e.g., branches or jumps) 2. Instruction Fetch Delays (e.g., a cache miss).

46
New cards

Explain the Unconditional Branch Hazard. Why does it happen?

Problem: When a JUMP instruction (e.g., I2: JMP I7) enters the pipeline, the pipeline hardware, which is designed to fetch instructions sequentially, has already fetched the next sequential instructions (e.g., I3, I4). Cause: The pipeline does not realize I2 is a jump until the Decode (ID) stage, and it cannot calculate the target address (I7) until the Execute (EX) stage (which often needs the ALU). By the time the processor knows the correct address, it has already fetched and started processing the wrong instructions (I3, I4).

47
New cards

What is a Branch Delay or Branch Stall?

This is the name for the wasted clock cycles that occur after a branch instruction. The pipeline must stall to flush (discard) the incorrect, sequentially

48
New cards

Why can Code Reordering not be used to solve an unconditional branch hazard, unlike a data hazard?

In a Data Hazard, you know that all instructions (I1, I2, I3…) will eventually be executed, so you can safely reorder them. In a Branch Hazard, the system cannot decide which code needs to be executed (e.g., I3 or I7). If the jump is taken, instructions I3, I4, I5, and I6 will never be executed. You cannot reorder instructions when you don't even know if they are on the correct execution path.

49
New cards

What is the hardware

based mitigation strategy for branch hazards described in the notes?

50
New cards

What is the Principle of Locality as it relates to cache memory?

This principle states that if a memory block is accessed, there is a high probability that nearby memory values will be accessed soon. This is the reason why, when data is fetched from Main Memory, an entire block of data (e.g., four instructions) is copied into the Cache, not just the single piece of data requested.

51
New cards

How does a Cache Miss cause an Instruction Hazard?

Problem: A pipeline assumes it can fetch an instruction in one clock cycle. This assumption only holds true if the instruction is in the fast Cache memory. Hazard: If the instruction is not in the Cache (a "Cache Miss"), the system must fetch it from the much slower Main Memory. This takes many clock cycles, forcing the Fetch stage to wait and causing the entire pipeline to stall.

52
New cards

What is Pre

fetching, and what additional hardware does it require?

53
New cards

What is the key architectural insight from implementing solutions like Operand Forwarding, Early Branch Recognition, and Pre

fetching?

54
New cards

What is the primary purpose of pipelining?

The primary purpose of pipelining is to improve performance and increase throughput. It does this by utilizing hardware components that would otherwise be idle, making the entire process more efficient.

55
New cards

What is the main trade

off or cost associated with implementing an efficient pipeline?

56
New cards

For what type of system is pipelining generally advisable?

Pipelining is generally advisable if a system needs to execute a lot of programs. If the requirement is smaller, investing in a complex pipelining environment may not be necessary or cost

57
New cards

Why do conditional branches create a significant instruction hazard?

Conditional branches create a hazard because the system cannot decide which instruction to fetch next until the condition's outcome is clear. A conditional branch has a 50% chance of executing the sequential statements and a 50% chance of jumping to the branch target. This outcome is often only decided after the third clock cycle, once the comparison is generated by the ALU.

58
New cards

What is the simplest solution to manage a conditional branch hazard?

The simplest way is to stall the pipeline.

59
New cards

Explain what a pipeline stall is and how NO

OP instructions are used to implement it.

60
New cards

Why is stalling (using NOPs) often preferred over speculatively fetching the wrong instruction?

Stalling is preferred because it avoids tampering with any of the processor registers. If the system incorrectly fetched and executed a predicted instruction, it would change many registers. If the prediction was wrong, all these registers would need to be flushed off (discarded), which creates a large overhead. By using NOP, no registers are changed.

61
New cards

What is the main disadvantage of using pipeline stalls?

The main disadvantage is that it compromises efficiency. The system is correct, but it is always wasting clock cycles (e.g., two cycles) every time a branch instruction appears.

62
New cards

What is a Branch Prediction Mechanism, and what is its goal?

It is a mechanism used to minimize stalls (which cannot be completely eliminated). Its goal is to decide, based on probability, whether a branch will be taken (go to the target label) or not taken (execute the next sequential instruction).

63
New cards

What is the basis for how branch prediction works?

Branch prediction works by studying the past history of similar branches. It analyzes how many times the branch was taken versus not taken and uses this history to predict the future outcome.

64
New cards

List and define the four prediction tags that can be added to an instruction based on its history.

Strongly Likely to be Taken (SLT): High certainty the branch will be taken (e.g., 90% chance). Likely to be Taken (LT): The branch may or may not be taken. Not Likely to be Taken (NLT): High certainty the branch will not be taken. Strongly Not Likely to be Taken (SNLT): Implies high certainty the branch will not be taken.

65
New cards

What is the difference between Static Branch Prediction and Dynamic Branch Prediction?

Static Branch Prediction: This is a simple 50

66
New cards

How does a system handle an incorrect Dynamic Branch Prediction to maintain correctness?

Two key practices are used: 1. Use of Temporary Registers: Any changes from the speculatively executed instruction (based on the prediction) are made only to a temporary set of registers (e.g., T0, T1). 2. Protection of Actual Registers: Changes are never made to the actual, designated processor registers until the prediction is confirmed as correct. If the prediction was correct, the changes are "stamped" (finalized) onto the original registers. If the prediction was incorrect, the instruction is preempted (stopped), and no flushing is needed because no actual registers were tampered with.

67
New cards

What are the major hardware implications of implementing branch prediction?

History Storage: Additional hardware is required for dedicated storage space (typically in the register file) to maintain the branch history. Instruction Format Changes: The instruction format itself must be changed to accommodate the prediction tags (SLT, LT, etc.). This leads to "bit diminution."

68
New cards

Explain the concept of bit diminution as a result of implementing hardware for hazards.

"Bit diminution" is the reduction of the available bits in an instruction format to represent the actual operation. For example, in a 32

69
New cards

What is a Structural Hazard?

A structural hazard arises due to a problem in the organization, structure, components, or the hardware itself. It occurs when two different instructions try to use the same hardware resource at the exact same time.

70
New cards

What is the classic example of a structural hazard, and what conflict does it cause?

Example: A system using a single Unified Memory to store both instructions and data. Conflict: A conflict occurs when one instruction (e.g., in the Execute stage) tries to write data to memory while a subsequent instruction (e.g., in the Fetch stage) tries to fetch itself from memory in the same clock cycle.

71
New cards

Why is this conflict a problem (the Inconsistency Risk)?

When a write operation is underway on a memory block, that portion must be blocked to all other operations. This is essential to prevent data inconsistency (e.g., another instruction reading a "mid

72
New cards

How are structural hazards (like the unified memory problem) solved?

They can only be solved by increasing the infrastructure or hardware components. The standard solution is to divide the memory into two separate dedicated units: 1. Data Memory (for data) 2. Instruction Memory (for instructions). This separation allows a data write and an instruction fetch to occur simultaneously without conflict