Pipelining-2

0.0(0)

Studied by 0 people

0.0(0)

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/71

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

72 Terms

New cards

What are the four typical stages of an instruction pipeline?

Fetch (F): Retrieve the instruction from memory. Decode (D): Interpret the instruction and fetch operands. Execute (E): Perform the specified operation (e.g., addition). Write Back (W): Store the result back into a register.

New cards

What happens during the Instruction Fetch (F) stage and what hardware is used?

This stage retrieves the instruction from memory. Hardware Used: Program Counter (PC): Provides the address of the instruction. Memory Address Register (MAR): Receives the address to access memory. Memory: The location where the instruction is stored. Instruction Register (IR): The fetched instruction is placed here.

New cards

What happens during the Instruction Decode (D) stage?

This stage interprets what the instruction needs to do. Action: The Control Unit "strips off" the instruction bits from the IR to identify the opcode (the operation) and the source/destination registers. Result: It generates the control signals to activate the correct hardware for the later stages and may also fetch the operands (e.g., the values from R2 and R3).

New cards

What happens during the Execute (E) stage?

This stage performs the actual operation. Hardware Used: The ALU (Arithmetic Logic Unit) is the primary component. Action: For an instruction like ADD R1, R2, R3, the ALU would take the contents of R2 and R3 and perform the addition.

New cards

What happens during the Write Back (W) stage?

This stage stores the final result. Action: The result from the ALU (or from a memory load) is written back into the destination register (e.g., R1) in the Register File.

New cards

What are interstage buffers (latches) and why are they essential for pipelining?

Interstage buffers (represented as V1, V2, V3) are temporary storage hardware components placed between the main pipeline stages (e.g., V1 is between Fetch and Decode). Necessity: They are essential because they hold the results of one stage (e.g., the fetched instruction) and pass it to the next stage. This allows the first stage's hardware (e.g., the Fetch unit) to immediately flush its contents into the buffer and become free to start processing the next instruction in the very next clock cycle.

New cards

Explain why a two stage (Fetch and Execute) pipeline is inefficient.

A two stage pipeline is inefficient because the stages are too large and block too many hardware components. This leads to conflicts where both "Fetch" and "Execute" might need to access the same hardware (memory) at the same time, Fetch to get an instruction, and Execute to store a result. This conflict blocks the pipeline, minimizing any parallel processing benefits.

New cards

Why is it difficult to have a pipeline with more than four stages?

As you divide the cycle into more and more stages, the Control Unit design becomes extremely complicated. The CU must be written to generate and manage a much higher number of control signals to activate and control all the additional, smaller components, which makes the system difficult to manage.

New cards

What is a pipeline hazard?

A pipeline hazard is any condition or conflict that forces the system to pause or delay, preventing it from executing the next instruction when it is supposed to.

New cards

Define "pipeline stall" (or "pipeline bubble").

A stall or bubble is the wasted time (a delay) that occurs when a pipeline has to wait due to a hazard. During that clock cycle, no useful work is being done in that part of the pipeline.

New cards

What are the three main types of pipeline hazards?

Data Hazard, Instruction Hazard, Structural Hazard

New cards

What is a Data Hazard? Explain with an example.

A data hazard occurs when an instruction is dependent on the data (result) of a previous instruction that has not yet finished executing. Example: 1. I1: C = A + B 2. I2: X = 4 * C The Conflict: I2 needs the value of 'C' for its operation. In a pipeline, I2 will reach its Decode/Operand Fetch stage (e.g., at Clock 3) and try to read 'C'. However, I1 will only calculate 'C' in its Execute stage (Clock 3) and make it available in its Write Back stage (Clock 4). Because 'C' is not available when I2 needs it, the pipeline must stall.

New cards

What are three methods used to mitigate (reduce) pipeline hazards?

Rearranging the Code (Code Reordering): Moving independent instructions into the delay slots. Inserting NOP (No Operation) Instructions: Filling the delay slots with "do nothing" instructions to force a wait. Operand Forwarding: A hardware technique that passes results back to earlier stages (though the notes caution it may not solve all data hazards).

New cards

Explain "Code Reordering."

If the code has other instructions that are completely independent of the ones causing the hazard, the programmer or compiler can move these independent instructions into the stall period. This keeps the pipeline doing useful work while it waits for the dependent data to become available.

New cards

What is a NOP instruction and when is it used?

A NOP (No Operation) is a "do nothing" instruction (e.g., represented by all zero bits). It is used to fill a delay slot when code reordering is not possible (e.g., no independent instructions are available). It forces the hardware to wait for one or more clock cycles, allowing the data hazard to resolve itself.

New cards

To write efficient code for a pipelined system, what programming practice should be avoided?

Programmers should avoid breaking a single, long expression into many small, sequential, dependent statements. Impact: A greater number of dependent instructions makes pipelining very difficult and increases the likelihood of data hazards and stalls. Result: An inefficiently written program (with many dependencies) will still run correctly, but its performance will be poor, defeating the entire purpose of having a pipelined architecture.

New cards

What is the primary limitation of a "single bus organization"?

The primary limitation is that the bus is a shared resource that acts like a "one way road." If multiple components need to use the bus at the same time, one must be stalled (delayed) while the other finishes. This sequential waiting increases the number of clock cycles needed for operations.

New cards

Explain why a simple memory write operation is inefficient on a single bus.

A memory write requires sending data, an address, and a control signal. On a single bus, this must be done in three separate steps, consuming three clock cycles:

1. Clock 1 (Data Transfer): The data (e.g., from register R2) is put on the bus to go into the Memory Buffer Register (MBR).

2. Clock 2 (Address Transfer): The address (e.g., from register R1) is put on the bus to go into the Memory Address Register (MAR).

3. Clock 3 (Signal Issuance): The control signal (e.g., "write") is issued using the bus. The actual memory operation only begins after these three clock cycles are complete.

New cards

What are the three types of dedicated buses in a "multiple bus organization"?

Address Bus: Carries address signals. Data Bus: Used to fetch and transfer data (e.g., between MBR and registers). Control Bus: Passes control signals.

New cards

What is the main advantage of a multiple bus organization, and what is its main drawback?

Advantage: The three tasks (data, address, and signal) can be executed in one single clock cycle because there are "three roads" available. This allows more instructions to be processed per clock cycle. Drawback: The main drawback is increased complexity. Generating and maintaining control signals for all the various buses simultaneously is much more complicated than managing a single bus.

New cards

What is a limitation that still exists in a multiple bus system?

You cannot schedule two instructions that are dependent on the same resource. For example, if two different instructions both require memory access, they cannot happen at the same time because you can still only send one address at a time on the single Address Bus.

New cards

What are the two main types of Control Unit (CU) design?

Hardwired Control Unit (HCU), Microprogrammed Control Unit (MCU)

New cards

What is a Hardwired Control Unit (HCU)?

An HCU is a control unit built using physical hardware components like logic gates, decoders, and multiplexers. Its logic is fixed in the hardware itself.

New cards

Explain the main limitation and "fixed nature" of an HCU.

An HCU is fixed because its logic is physically built in. It can only handle a limited instruction set, and the number of control signals it can output is predetermined. If a change is needed (like supporting more instructions or registers), the physical hardware design itself must be changed.

New cards

List the key components involved in the operation of a Hardwired Control Unit.

Instruction Register (IR): Provides the instruction (e.g., 32 bits) as input to the CU.
Instruction Decoders: The instruction is "stripped" into parts (like the opcode). These parts go to decoders (e.g., a 3 to 8 decoder) that identify the operation.
Timing Signal Generator: A clock signal is fed into a decoder to generate a set of timing signals (e.g., P0 to P7) that turn components ON and OFF at the right time.
External Signals & Condition Codes: Inputs like "Memory Function Complete" (MFC), interrupts, or ALU flags (Z, N, V, C) that can alter the control flow.
Encoder: The final component that takes the decoded instruction and the timing signals as input and generates the final, specific control signals (like PCIN or MAROUT).

New cards

What is a Microprogrammed Control Unit (MCU)?

An MCU is a control unit that relies more on software implementation. It involves writing complex code for instruction handling and decoding, which is stored in a special control memory.

New cards

Compare the flexibility of an HCU vs. an MCU.

HCU: Is inflexible. To add registers, you must physically change the hardware. MCU: Is flexible. To add registers (e.g., go from 32 to 64), you can make the change in the software (micro code) without rearranging physical components.

New cards

What is the main "cost" or drawback of using a flexible Microprogrammed CU?

This high degree of flexibility comes at the cost of a complex design for the Control Unit itself.

New cards

What is an Instruction Set?

The Instruction Set (or ISA) is the set of all operations (like multiply, add, sum) that the system's hardware is designed to support.

New cards

What do RISC and CISC stand for?

RISC: Reduced Instruction Set Computer, CISC: Complex Instruction Set Computer

New cards

What is the most critical difference between RISC and CISC regarding instruction size?

RISC: Uses a fixed instruction size. Every instruction is the same length (e.g., 32 bits), even if some fields are left empty. CISC: Uses a variable instruction size. Instructions can be different lengths (e.g., one might be 12 bits, another 64 bits), which can be more efficient in some cases.

New cards

Compare RISC and CISC on key features (Instruction Count, CU Type, Flexibility).

RISC: Instruction Count: Handles very few instructions. CU Type: Typically uses a Hardwired Control Unit. Flexibility: Restricted and fixed. CISC: Instruction Count: Allows for adding more instructions. CU Type: Typically uses a Microprogrammed Control Unit. Flexibility: High flexibility fields and components can be scaled.

New cards

Is CISC inherently "better" or "more efficient" than RISC?

No. You can't state that one is inherently better. Increasing a component (like the number of registers in a CISC machine) does not guarantee an increase in the overall system's efficiency or performance, as other factors may create bottlenecks.

New cards

What is the difference between Infix and Postfix notation?

Infix: The operator is placed between the operands (e.g., X * Y). Postfix: The operator is fixed after the operands (e.g., XY*).

New cards

What is the general method for converting an Infix expression to Postfix?

The conversion process requires considering the highest priority operators first and evaluating the expression from left to right. Operations within parentheses must be solved first, following the rules of operator precedence.

New cards

What is a foundational assumption made in pipeline timing diagrams that is often unrealistic?

The foundational assumption is that each instruction stage takes exactly one clock cycle. This is often untrue in reality, especially for operations involving memory access (like fetching from main memory), which can take many clock cycles.

New cards

According to the notes, what two things should you always include in an exam answer when explaining a pipeline hazard?

You should always include an example (like a sequence of instructions) and draw a timing diagram to best illustrate the stall and the conflict.

New cards

Define "Data Hazard."

A Data Hazard occurs when the data (operand) required by an instruction is not yet available at the time it is needed for execution. This is because a prior instruction, which is supposed to compute that data, has not yet finished its execution and written back the result.

New cards

What are the three primary solutions for Data Hazards discussed in the notes?

Reordering Code (Instruction Reordering), Software Handling (NOP Insertion), Operand Forwarding (Hardware Based Solution)

New cards

Explain "Reordering Code" as a solution for data hazards. What is its main limitation?

This solution involves restructuring the program's code to move independent instructions between the two dependent instructions. This creates a natural delay, giving the first instruction enough time to complete and make its data available before the second instruction needs it. Limitation: This is only possible if there are independent instructions available in the code that can be safely moved without changing the program's logic.

New cards

What is "NOP Insertion," and when is it used?

Definition: NOP (Not an Operation) is a "filler" instruction that does nothing except consume a clock cycle. Usage: This is a software based solution used when code reordering is not possible. The compiler or programmer inserts NOPs into the pipeline to create a deliberate stall, forcing the dependent instruction to wait until the required data is ready.

New cards

What is "Operand Forwarding," and is it a hardware or software solution?

Definition: Operand Forwarding (also known as Bypassing) is a hardware based solution. Instead of forcing an instruction to wait until data is written to the register file (in the Write Back stage), this technique uses hardware to grab the computed value directly from the output of the EX (Execution) stage or its inter stage buffer. Benefit: This "forwards" the data to the next instruction's EX stage just in time, saving multiple clock cycles and avoiding a stall.

New cards

Define "Instruction Hazard."

An Instruction Hazard occurs when the instruction itself is not available for the pipeline in the next clock cycle when it is needed. This is typically caused by: 1. Control Flow Changes (e.g., branches or jumps) 2. Instruction Fetch Delays (e.g., a cache miss)

New cards

Explain the "Unconditional Branch Hazard." Why does it happen?

Problem: When a JUMP instruction (e.g., I2: JMP I7) enters the pipeline, the pipeline hardware, which is designed to fetch instructions sequentially, has already fetched the next sequential instructions (e.g., I3, I4). Cause: The pipeline does not realize I2 is a jump until the Decode (ID) stage, and it cannot calculate the target address (I7) until the Execute (EX) stage (which often needs the ALU). By the time the processor knows the correct address, it has already fetched and started processing the wrong instructions (I3, I4).

New cards

What is a "Branch Delay" or "Branch Stall"?

This is the name for the wasted clock cycles that occur after a branch instruction. The pipeline must stall to flush (discard) the incorrect, sequentially fetched instructions (e.g., I3, I4) and then wait for the target address (I7) to be calculated so it can fetch the correct instruction.

New cards

Why can "Code Reordering" not be used to solve an unconditional branch hazard, unlike a data hazard?

In a Data Hazard, you know that all instructions (I1, I2, I3…) will eventually be executed, so you can safely reorder them. In a Branch Hazard, the system cannot decide which code needs to be executed (e.g., I3 or I7). If the jump is taken, instructions I3, I4, I5, and I6 will never be executed. You cannot reorder instructions when you don't even know if they are on the correct execution path.

New cards

What is the hardware based mitigation strategy for branch hazards described in the notes?

The solution is early recognition through a hardware modification to the Instruction Set Architecture (ISA). Mechanism: A few bits (e.g., 1 or 2) are reserved in the instruction's binary format specifically to represent the type of instruction (e.g., 0 for jump, 1 for non jump). Action: A special logic in the Fetch stage immediately checks these bits. If it sees a "jump" instruction, it stops fetching subsequent sequential instructions (like I3). Result: This prevents the pipeline from being filled with incorrect instructions, saving overhead. The pipeline will still stall (e.g., for one cycle) while it waits for the target address to be calculated, but the stall is minimized.

New cards

What is the "Principle of Locality" as it relates to cache memory?

This principle states that if a memory block is accessed, there is a high probability that nearby memory values will be accessed soon. This is the reason why, when data is fetched from Main Memory, an entire block of data (e.g., four instructions) is copied into the Cache, not just the single piece of data requested.

New cards

How does a "Cache Miss" cause an Instruction Hazard?

Problem: A pipeline assumes it can fetch an instruction in one clock cycle. This assumption only holds true if the instruction is in the fast Cache memory. Hazard: If the instruction is not in the Cache (a "Cache Miss"), the system must fetch it from the much slower Main Memory. This takes many clock cycles, forcing the Fetch stage to wait and causing the entire pipeline to stall.

New cards

What is "Pre fetching," and what additional hardware does it require?

Definition: Pre fetching is a solution to mitigate (hide) cache miss delays. Its goal is to access the memory before the instruction is critically needed.

Required Hardware: This cannot be done with the standard pipeline registers (PC, IR), as they are already busy.

Pre fetching requires new, dedicated hardware:

A specialized Fetch Unit (often a replication of the fetch hardware).
An Instruction Queue/Buffer. Mechanism: This new, dedicated Fetch Unit works continuously in the background, fetching instruction blocks from Main Memory and placing them into the Instruction Queue. The main pipeline then simply pulls instructions from this fast queue, and (ideally) never has to wait for Main Memory.

New cards

What is the key architectural insight from implementing solutions like Operand Forwarding, Early Branch Recognition, and Pre fetching?

The key insight is that efficient pipelining requires adding new, specialized hardware components. A simple pipeline structure without these hardware optimizations (like inter stage buffers, modified ISAs, or instruction queues) will be inefficient and will not fully utilize the benefits of pipelining due to constant stalls.

New cards

What is the primary purpose of pipelining?

The primary purpose of pipelining is to improve performance and increase throughput. It does this by utilizing hardware components that would otherwise be idle, making the entire process more efficient.

New cards

What is the main trade off or "cost" associated with implementing an efficient pipeline?

The trade off is the cost of additional hardware. An efficient pipeline structure is not possible without the support of dedicated additional hardware components, such as separate memories (for instructions and data), branch prediction mechanisms, and systems for identifying branch instructions.

New cards

For what type of system is pipelining generally advisable?

Pipelining is generally advisable if a system needs to execute a lot of programs. If the requirement is smaller, investing in a complex pipelining environment may not be necessary or cost effective.

New cards

What is an "Instruction Hazard" (or Control Hazard)?

An instruction hazard occurs when the required instruction is not available in the next clock cycle when it is needed. This is primarily due to branches (conditional or unconditional) or memory delays like cache misses.

New cards

Why do conditional branches create a significant instruction hazard?

Conditional branches create a hazard because the system cannot decide which instruction to fetch next until the condition's outcome is clear. A conditional branch has a 50% chance of executing the sequential statements and a 50% chance of jumping to the branch target. This outcome is often only decided after the third clock cycle, once the comparison is generated by the ALU.

New cards

What is the simplest solution to manage a conditional branch hazard?

The simplest way is to stall the pipeline.

New cards

Explain what a "pipeline stall" is and how "NO OP" instructions are used to implement it.

A pipeline stall is a period where the system does nothing and simply waits for the outcome of the branch. The clock cycles are represented by "bubbles." This is implemented by introducing NO OP (No Operation) instructions immediately after the branch, which effectively makes the system "wait" without processing anything.

New cards

Why is stalling (using NOPs) often preferred over speculatively fetching the wrong instruction?

Stalling is preferred because it avoids tampering with any of the processor registers. If the system incorrectly fetched and executed a predicted instruction, it would change many registers. If the prediction was wrong, all these registers would need to be flushed off (discarded), which creates a large overhead. By using NOP, no registers are changed.

New cards

What is the main disadvantage of using pipeline stalls?

The main disadvantage is that it compromises efficiency. The system is correct, but it is always wasting clock cycles (e.g., two cycles) every time a branch instruction appears.

New cards

What is a "Branch Prediction Mechanism," and what is its goal?

It is a mechanism used to minimize stalls (which cannot be completely eliminated). Its goal is to decide, based on probability, whether a branch will be taken (go to the target label) or not taken (execute the next sequential instruction).

New cards

What is the basis for how branch prediction works?

Branch prediction works by studying the past history of similar branches. It analyzes how many times the branch was taken versus not taken and uses this history to predict the future outcome.

New cards

List and define the four "prediction tags" that can be added to an instruction based on its history.

Strongly Likely to be Taken (SLT): High certainty the branch will be taken (e.g., 90% chance). Likely to be Taken (LT): The branch may or may not be taken. Not Likely to be Taken (NLT): High certainty the branch will not be taken. Strongly Not Likely to be Taken (SNLT): Implies high certainty the branch will not be taken.

New cards

What is the difference between "Static Branch Prediction" and "Dynamic Branch Prediction"?

Static Branch Prediction: This is a simple 50 50% gamble with equal chances of being correct or incorrect. It often just guesses by executing the instructions in sequence. Dynamic Branch Prediction: This relies on observing and continuously updating the execution history. Every time a branch is taken or not taken, the history is updated, and the decision is based on this live data. However, it is still not 100% sure.

New cards

How does a system handle an incorrect "Dynamic Branch Prediction" to maintain correctness?

Two key practices are used:

1. Use of Temporary Registers: Any changes from the speculatively executed instruction (based on the prediction) are made only to a temporary set of registers (e.g., T0, T1).

2. Protection of Actual Registers: Changes are never made to the actual, designated processor registers until the prediction is confirmed as correct. If the prediction was correct, the changes are "stamped" (finalized) onto the original registers. If the prediction was incorrect, the instruction is preempted (stopped), and no flushing is needed because no actual registers were tampered with.

New cards

What are the major hardware implications of implementing branch prediction?

History Storage: Additional hardware is required for dedicated storage space (typically in the register file) to maintain the branch history. Instruction Format Changes: The instruction format itself must be changed to accommodate the prediction tags (SLT, LT, etc.). This leads to "bit diminution."

New cards

Explain the concept of "bit diminution" as a result of implementing hardware for hazards.

"Bit diminution" is the reduction of the available bits in an instruction format to represent the actual operation. For example, in a 32 bit instruction: 2 bits might be reserved to mark a JUMP instruction. (Leaves 30 bits) Another 2 bits might be reserved for the branch prediction likelihood tag. This leaves only 28 bits to represent the entire 32 bit instruction format, which compromises the number of registers that can be used or forces designers to use a larger instruction set size.

New cards

How does a "Cache Miss" cause an instruction hazard?

A cache miss occurs when a requested instruction is not found in the cache memory. The read operation must then be redirected to the slower Main Memory. This access takes multiple clock cycles (two, three, or more), during which the fetch unit is busy and cannot provide the instruction to the Decode stage, causing the pipeline to stall.

New cards

What is a "Structural Hazard"?

A structural hazard arises due to a problem in the organization, structure, components, or the hardware itself. It occurs when two different instructions try to use the same hardware resource at the exact same time.

New cards

What is the classic example of a structural hazard, and what conflict does it cause?

Example: A system using a single Unified Memory to store both instructions and data. Conflict: A conflict occurs when one instruction (e.g., in the Execute stage) tries to write data to memory while a subsequent instruction (e.g., in the Fetch stage) tries to fetch itself from memory in the same clock cycle.

New cards

Why is this conflict a problem (the "Inconsistency Risk")?

When a write operation is underway on a memory block, that portion must be blocked to all other operations. This is essential to prevent data inconsistency (e.g., another instruction reading a "mid value" that is only partially written). Because the memory is blocked for the write, the instruction fetch must wait, causing a pipeline stall.

New cards

How are structural hazards (like the unified memory problem) solved?

They can only be solved by increasing the infrastructure or hardware components. The standard solution is to divide the memory into two separate dedicated units: 1. Data Memory (for data) 2. Instruction Memory (for instructions) This separation allows a data write and an instruction fetch to occur simultaneously without conflict