Pipelining

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/120

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

121 Terms

1
New cards

What is Instruction Pipelining?

Pipelining is a performance enhancement technique that processes multiple instructions simultaneously by overlapping their execution stages, aiming to maximize hardware utilization and increase CPU throughput.

2
New cards

Why is pipelining necessary in computer architecture?

It is needed to minimize execution time and save clock cycles by ensuring hardware components, like the ALU and Decode unit, are not idle but are continually processing different instruction stages.

3
New cards

What is the core working principle and critical constraint of instruction pipelining?

The principle is to start the next instruction as soon as the previous one moves to the next stage. The critical constraint is that two instructions cannot occupy the same stage at the same time.

4
New cards

List the four typical stages into which an instruction cycle is divided for efficient pipelining.

The four stages are: Fetch (F), which retrieves the instruction Decode (D), which interprets the instruction Execute (E), which performs the operation and Write Back (W), which stores the result.

5
New cards

What are Interstage Buffers (Latches) and why are they essential for pipelining?

Interstage buffers (V1, V2, V3) are temporary hardware storage placed between stages. They allow a stage to immediately flush its results and become free to start processing the next instruction in the subsequent clock cycle.

6
New cards

Why is a two stage (Fetch and Execute) pipeline structure generally considered inefficient?

simultaneous access to memory, causing constant hardware conflicts

7
New cards

What is sequential execution, and how does it compare to pipelining in terms of timing?

Sequential execution processes one instruction completely before the next begins. If I1 takes four cycles, I2 starts at Cycle 5. Pipelining allows I2 to start at Cycle 2, overlapping execution steps.

simple: pipelining overlaps execution steps

8
New cards

What is a pipeline hazard?

condition or conflict that forces the system to pause or delay execution, preventing the next instruction from being executed when desired, resulting in a stall or bubble.

9
New cards

Define a pipeline stall or bubble.

A pipeline stall or bubble refers to the wasted clock cycles (a delay) that occur when the pipeline must wait for a hazard to resolve, meaning no useful work is being done in that portion of the pipeline.

10
New cards

What is the primary trade-off associated with implementing an efficient pipeline structure?

The primary trade-off is the cost of additional hardware. An efficient pipeline requires dedicated components like separate Instruction/Data Memories, interstage buffers, and branch prediction mechanisms.

11
New cards

What is a Single Bus Organization and what is its primary limitation?

A single bus organization uses one shared communication bus for all components (address, data, control). Its limitation is that only one component can use the bus at a time, forcing others to wait (stalling the bus).

12
New cards

Why does a memory write operation require three separate clock cycles in a single bus organization?

Step 1: Data transfer (R2 to MBR) uses the bus. Step 2: Address transfer (R1 to MAR) uses the bus. Step 3: Signal issuance (Write signal) uses the bus. Since the single bus can only handle one transfer sequentially, three clock cycles are required before the memory operation can begin.

13
New cards

What is the primary advantage of a Multiple Bus Organization?

The advantage is parallel execution. Having dedicated Address, Data, and Control buses allows tasks like transferring data, address, and signals to occur simultaneously in a single clock cycle.

14
New cards

What is the main drawback of a Multiple Bus Organization?

The main drawback is the increased complexity of the Control Unit (CU). The CU must manage, issue, and maintain control signals for multiple dedicated buses simultaneously, which is more complicated than managing a single entity.

15
New cards

What is one limitation concerning resource dependency that still exists even with a multiple bus organization?

Even with multiple buses, simultaneous scheduling is impossible if instructions require the same resource. For example, two memory fetches cannot occur simultaneously because there is still only one Address Bus to send the address.

16
New cards

What are the three dedicated buses typically found in a multiple bus organization?

The three dedicated buses are the Address Bus, which carries address signals the Data Bus, which is used for transferring data and the Control Bus, which passes control signals.

17
New cards

Describe the Program Counter (PC) and its essential communication requirement.

Program Counter holds the address of the next instruction to be fetched. It requires two-way communication: outputting the current address to the bus, and receiving the updated address (PC + 4 or jump target) back from the bus.

18
New cards

What is the function of the Instruction Register (IR)?

The Instruction Register holds the instruction fetched from memory (in binary form). It passes various fields of this instruction (opcode, register fields) to the Control Unit for decoding and execution signal generation.

19
New cards

Why does the Arithmetic and Logic Unit (ALU) not place its result directly onto the bus?

The ALU does not directly output to the bus because this would complicate bus arbitration, and intermediate results must often be held. Instead, results are stored in the temporary Z register, which then outputs to the bus via the Z out signal.

20
New cards

What are the Memory Address Register (MAR) and Memory Data Register (MDR), and what is their role?

MAR holds the memory address needed for access. MDR (or MBR) temporarily holds data being transferred to memory (write) or retrieved from memory (read). They interface memory with the CPU registers.

21
New cards

Explain the step-by-step process of Stage 1: Instruction Fetch (IF), including control signals.

  • Transfer Address to MAR: PC out, MAR in.

  • Update PC: (Simultaneous with step 1) ALU calculates PC + 4, temporary stored in Z, then Z out, PC in.

  • Initiate Read: CU issues Memory Read signal.

  • Wait: System waits for Memory Function Complete (MFC) signal.

  • Store Instruction: Instruction is transferred via bus and stored in IR in.

22
New cards

What is the function of the Control Unit (CU)?

The Control Unit is the central CPU component responsible for decoding the instruction stored in the IR. It determines the operation type and then generates all necessary control signals (e.g., R in, R out, Add) to activate appropriate hardware components.

23
New cards

Explain the register read process using the R in and R out control signals in a synchronous system.

Register reading takes two clock cycles. First, the CU activates the register's R in signal to prepare for output control. In the subsequent clock cycle, the R out signal is activated, which pushes the register's content onto the shared bus for transfer.

24
New cards

Using the instruction ADD R3, R1, R2, explain the detailed steps involved in the Instruction Decode and Execute stages.

Decode: CU identifies ADD, R1, R2 as sources, R3 as destination. Issues ADD signal to ALU and READ signals for R1, R2. Operand Fetching: R1 and R2 values read simultaneously onto the bus (R1 out, R2 out). R1 routed to ALU input A, R2 routed to temporary register Y. Execution: ALU performs addition of values from R1 and R2. Result stored in temporary register Z. Write Back: Z out is activated, placing result on bus. R3 in is activated, storing the final result into R3.

25
New cards

Describe the complete sequence of steps and control signals required to update the Program Counter (PC) to PC + 4.

Signal ALU: CU signals the ALU to perform the PC + 4 operation.

Transfer PC: PC content transferred via bus to an ALU input (PC out).

Supply Constant 4: CU uses a multiplexer to select the constant value four as the second ALU input.

Calculate: ALU performs PC + 4 operation. Result stored in Z register.

Write Back: Z out activated, pushing new address onto bus. New address transferred to PC (PC in).

26
New cards

What is a Hardwired Control Unit (HCU)?

An HCU is a Control Unit fundamentally built using physical hardware components like logic gates, decoders, and multiplexers. Its instruction logic is fixed in the chip's physical design.

27
New cards

What is the main limitation and fixed nature of a Hardwired Control Unit?

HCU is rigid it can only handle a limited, fixed instruction set determined by the physical components fit. Any change, such as increasing register size, requires a physical redesign or rearrangement of the hardware components.

28
New cards

What are the key inputs used by a Hardwired Control Unit to generate final control signals?

The key inputs include the Instruction Register (IR), Timing Signals (T0, T1, P1, P2), External Signals (like Memory Function Complete or interrupts), and Condition Codes (Z, N, V, C flags from the ALU).

29
New cards

What is a Microprogrammed Control Unit (MCU)?

An MCU relies more on a software-like implementation, involving complex codes (micro-instructions) for instruction handling and decoding stored in a control memory.

30
New cards

How does a Microprogrammed Control Unit provide greater flexibility compared to an HCU?

An MCU provides high flexibility because changes, such as increasing the size of the register file (e.g., 32 to 64 registers), can be achieved by modifying the software/micro-code rather than rearranging physical hardware components.

31
New cards

What does RISC stand for, and what type of control unit does it typically use?

RISC stands for Reduced Instruction Set Computer. It is typically associated with using a Hardwired Control Unit design due to its focus on a small, fast, and simple instruction set.

32
New cards

What does CISC stand for, and what type of control unit does it typically use?

CISC stands for Complex Instruction Set Computer. It is typically associated with using a Microprogrammed Control Unit, which is necessary to manage its complex logic and large, variable instruction set.

33
New cards

What is the most critical difference between RISC and CISC architectures regarding instruction size?

RISC uses a fixed instruction size (e.g., always 32 bits), even if fields are empty. CISC uses a variable instruction size, where instructions can range widely in length (e.g., 12 bits to 64 bits).

34
New cards

Why is memory logically divided into Instruction Storage and Data Storage sections?

Memory is divided to allow simultaneous access. If unified, a data write would block the memory to prevent inconsistency, stopping instruction reading. Separation aids pipelining by allowing concurrent fetch and data operations.

35
New cards

Why must the Program Counter be incremented by four (PC + 4) in the Instruction Fetch stage?

In the system discussed, memory is byte addressable, meaning each address points to one byte. Since instructions are typically 32 bits (4 bytes), the next sequential instruction is always located 4 bytes ahead of the current address.

36
New cards

What is a Data Hazard, and when does it occur in a pipeline?

A Data Hazard occurs when a subsequent instruction (I2) needs data (an operand) that is generated by a prior instruction (I1) that has not yet completed its Write Back phase, meaning the data is not yet available in the register file.

37
New cards

Explain Operand Forwarding (Bypassing) as a solution for data hazards.

Operand Forwarding is a hardware solution that takes the required operand directly from the output of the Execution (EX) stage or an intermediate buffer. This bypasses the need to wait for the data to be written back to the register file, saving clock cycles.

38
New cards

What is the software solution for data hazards when code reordering is impossible?

The software solution is NOP Insertion. No-Operation (NOP) instructions (represented by zero bits) are inserted as fillers to consume clock cycles and create a stall, providing the necessary delay for the dependent data to become available.

39
New cards

What is a Structural Hazard, and what is its underlying cause?

A structural hazard arises from hardware organization issues. It occurs when two different instructions try to use the same physical hardware resource simultaneously in the same clock cycle.

40
New cards

What is the classic example of a structural hazard involving memory?

The classic example is using a Unified Memory system for both instructions and data. A conflict occurs when one instruction needs to store data (write) to memory while the next instruction needs to fetch itself (read) from memory simultaneously.

41
New cards

How is the structural hazard caused by unified memory solved?

This hazard is solved by increasing the infrastructure: dividing the memory into two separate, dedicated units—Instruction Memory and Data Memory. This allows simultaneous instruction fetch and data access, avoiding stalls.

42
New cards

What is an Instruction Hazard (or Control Hazard)?

An Instruction Hazard occurs when the required instruction is unavailable for pipelining in the next clock cycle due to control flow changes (branches/jumps) or instruction fetch delays (cache misses).

43
New cards

What is an Unconditional Branch Hazard?

This hazard occurs when a JUMP instruction is recognized late (Decode/Execute stage), and by that time, the pipeline has already fetched one or more incorrect sequential instructions that must be discarded, causing a Branch Delay stall.

44
New cards

What is the primary mitigation strategy for an unconditional branch hazard?

The primary strategy is Early Recognition: modifying the Instruction Set Architecture (ISA) to reserve specific bits that identify a jump instruction immediately in the Fetch stage, allowing the processor to stop fetching incorrect instructions preemptively.

45
New cards

Define the Branch Delay (Branch Stall).

The Branch Delay is the number of clock cycles the pipeline stalls while waiting for the calculation of the target address of a jump or branch instruction, which is often calculated by the ALU in the Execution phase.

46
New cards

Why do conditional branches create a significant instruction hazard?

Conditional branches are hazardous because the system cannot decide whether to take the sequential path or the branch target until the condition (comparison outcome) is calculated by the ALU, often only after the third clock cycle.

47
New cards

What is the simplest way to manage a conditional branch hazard?

The simplest method is Pipeline Stall. When a branch instruction is encountered, the subsequent clock cycles are filled with NOP (No-Operation) instructions, ensuring the system waits without attempting speculative execution or modifying processor registers.;

48
New cards

What is a Branch Prediction Mechanism, and what is its goal?

It is a mechanism used to minimize stalls by guessing, based on probability and execution history, whether a conditional branch will be taken (jump) or not taken (continue sequentially).

49
New cards

Differentiate between Static and Dynamic Branch Prediction.

Static prediction is a simple, fixed guess (e.g., 50/50 chance, often assuming sequential execution). Dynamic prediction observes and continuously updates past execution history to assign likelihood tags to instructions for a better prediction.

50
New cards

Why does speculative execution based on branch prediction only use temporary registers?

Speculative changes are made only to temporary registers to protect the actual processor registers. If the prediction is wrong, the instruction is preempted without needing to flush or discard changes from the permanent registers.

51
New cards

List and define the four prediction tags used in dynamic branch prediction.

Strongly Likely to be Taken (SLT): High certainty the branch will be taken. Likely to be Taken (LT): Branch may or may not be taken. Not Likely to be Taken (NLT): Most probably the branch won't be taken. Strongly Not Likely to be Taken (SNLT): High certainty the branch will not be taken.

52
New cards

What is "bit diminution" resulting from branch prediction hardware?

Bit diminution is the reduction in available bits within the fixed instruction format (e.g., 32 bits) due to reserving space for hardware flags, such as 2 bits for the JUMP marker and 2 bits for the branch prediction likelihood tag.

53
New cards

How does a Cache Miss cause an Instruction Hazard (Memory Delay)?

A cache miss occurs when the required instruction is absent in the fast cache memory. The system must then access the much slower Main Memory, which takes multiple clock cycles, causing the Fetch unit and the entire pipeline to stall.

54
New cards

What is Pre-fetching, and how does it mitigate cache miss delays?

Pre-fetching is a solution that uses dedicated hardware (a replicated Fetch Unit and Instruction Queue) to continuously fetch instructions from Main Memory into a buffer in the background, ensuring instructions are available immediately when the main pipeline needs them.

55
New cards

What additional hardware is necessary to implement Pre-fetching effectively?

Pre-fetching requires a specialized, replicated Fetch Unit and an Instruction Queue/Buffer. These components work independently of the main pipeline's PC and IR registers to stage instructions from the slower Main Memory.

56
New cards

What is the goal of Program Counter (PC) Relative Addressing, used in jump instructions?

Its goal is to calculate the final, exact target address by adding the current Program Counter (the base address) to the Offset value stored within the instruction itself, allowing the program to be relocated anywhere in memory.

57
New cards

Describe the step-by-step control signals involved in the Execution phase of a Jump instruction.

Get Operands: PC out, IR offset field out.
Route to ALU: PC value to ALU input, Offset to temporary Y register (Y in).
Calculate: ALU performs Addition (PC + Offset). Result stored in Z.
Update PC: Z out, PC in.

58
New cards

Why are MAR and MDR typically not used during a standard Register Transfer operation like ADD R1, R2, R3?

They are not used because Register Transfer operations involve only the registers and the ALU, and do not require accessing the external memory unit, which is the primary interface role of MAR and MDR.

59
New cards

In synchronous operation, what is the "Single Output Activation" rule for a single-bus system?

The rule states that only one register's OUT signal (e.g., R1 out) can be active in a single clock cycle. Activating two simultaneously would lead to a bus clash and data corruption.;

60
New cards

How is the instruction decoded by the Control Unit?

The Instruction Register sends the instruction fields (opcode, register fields) to the CU. The CU identifies the instruction format, the required ALU operation, and which source/destination registers are involved, generating corresponding control signals.

61
New cards

What are Condition Code Registers (flags), and why are they vital for control logic?

Condition Code Registers (Z, N, V, C) are special 1-bit registers associated with the ALU that store status flags (Zero, Negative, Overflow, Carry). They are vital because they provide conditional inputs needed for branch logic decisions in the CU.

62
New cards

Explain how the separation of instruction and data memory aids pipelining.

By having separate memories, the Instruction Fetch stage can access the Instruction Memory simultaneously with the Execute stage accessing the Data Memory (e.g., for a load or store operation), thereby eliminating structural hazards and maximizing throughput.

63
New cards

Why is the flexibility of CISC architecture costly?

The flexibility of the CISC architecture (variable instruction sizes, scalable fields) requires a highly complex design for the Microprogrammed Control Unit, increasing the overall cost and complexity of the processor implementation.

64
New cards

What is the purpose of the Encoder component in a Hardwired Control Unit block diagram?

The Encoder is the final logic component that generates the final, specific control signals (like PCIN or MAROUT). It takes the timing signals (when to act) and the decoded micro-instructions (what to do) as input.

65
New cards

What is the difference between a read hit and a read miss in cache memory?

A read hit means the required data is found in the fast cache and retrieved quickly, bypassing Main Memory. A read miss means the data is not in the cache, requiring the system to access the slower Main Memory, causing a stall.

66
New cards

What is Data Inconsistency, and what is its cause related to cache writing?

Data Inconsistency occurs when a write operation updates only the cache copy, leaving the Main Memory copy outdated (e.g., cache X=7, memory X=5). This necessitates specific Write Policies to maintain consistency.

67
New cards

Explain the Write Through cache policy.

Write Through ensures immediate consistency by simultaneously updating both the Cache copy and the Main Memory copy during any write operation. This may slightly increase required clock cycles due to dual writes.

68
New cards

Explain the Write Back cache policy.

Write Back makes changes only to the Cache initially. Main Memory is only updated when the corresponding block is evicted (replaced) from the cache or before system shutdown, performing local updates efficiently.

69
New cards

In cache mapping, why is the Tag essential for uniquely identifying a memory block?

The Tag is necessary because, especially in Direct Mapped cache, multiple blocks from the large Main Memory map to the same single cache index (collision). The tag differentiates which specific block currently resides at that index.

70
New cards

What two conditions must be verified simultaneously for a Direct Mapped cache access to be a hit?

A hit requires verifying that the Valid Bit is True (meaning the entry holds valid data) AND that the Tag stored in that cache location matches the Tag bits from the incoming memory address.

71
New cards

What is the purpose of using Set

Associative Cache mapping over Direct Mapped cache?

72
New cards

What is the FIFO (First In, First Out) cache replacement policy?

FIFO dictates that when a cache set or block location is full, the block that entered that location earliest will be the first one to be removed (evicted) to make room for new data.

73
New cards

What is the LRU (Least Recently Used) cache replacement policy?

LRU dictates that when replacement is needed, the block that was accessed or used least recently is chosen as the candidate for eviction, under the assumption that recently used data is likely to be needed again soon.

74
New cards

Why is a Fully Associative Cache rarely implemented despite its flexibility?

Fully associative cache is rarely implemented because it requires extensive Parallel Search across all cache blocks to find a hit. This dramatically increases hardware complexity (comparators, multiplexers) and cost.

75
New cards

What characteristic of an inefficiently written program can defeat the purpose of a pipelined architecture?

An inefficiently written program containing a greater number of sequential, dependent statements (high data dependency) will suffer repeated data hazards and stalls, causing its performance to regress toward sequential execution speed.

76
New cards

What is a key difference between a Data Hazard and a Branch Hazard concerning instruction execution?

In a Data Hazard, all instructions will eventually execute (though delayed). In a Branch Hazard, the system doesn't know which instructions to execute next (sequential or jump target), and incorrect instructions must be discarded entirely.

77
New cards

Why is Operand Forwarding sometimes unable to resolve the entire data hazard delay?

Forwarding is limited by the distance of the dependency. If the required value is needed in a phase too early (e.g., in the Decode phase), even forwarding the result from the Execution phase may still be too late, forcing a short stall.

78
New cards

What is the definition of PC Relative Addressing?

PC Relative Addressing is the calculation method used for branches, where the exact target address is determined by summing the current Program Counter value (Base Address) and the relative Offset value contained in the instruction.

79
New cards

Explain how Register Control Signals like R in and R out function as gate-controlled mechanisms.

These signals are generated by the Control Unit using hardware logic (gates, decoders) to turn specific component gates ON (1) or OFF (0). They are not handled by software structures like if/else statements.;

80
New cards

What are External Signals in the context of the Control Unit?

External Signals are inputs from other system components (like MFC, Read, Write, or Interrupts) that are necessary for the Control Unit to alter its flow or logic, especially when coordinating I/O or memory completion.

81
New cards

What hardware components are essential to the Write Back (W) stage of an instruction?

The Write Back stage primarily involves the temporary Z register (outputting the result), the bus (carrying the result), and the destination Register File (receiving the input via its R in signal).

82
New cards

What design consideration supports the four-step instruction cycle (F, D, E, W)?

The four-step cycle is optimal because fewer stages (like two) lead to component blocking and hardware conflicts, while more stages lead to significantly complex Control Unit design due to managing numerous control signals.

83
New cards

In the execution of a memory load operation like ADD R1, [R3], why is R1 fetched concurrently while waiting for R3's data?

Memory access is slow. To prevent the CPU from idling during the long wait for the MFC signal, the fast register operand R1 is concurrently fetched and stored in the temporary Y register, ensuring it is ready when the slow memory data arrives.

84
New cards

What is the function of the Timing Counter in a Hardwired Control Unit?

The Timing Counter takes the Clock Signal and feeds it into a decoder, generating a sequence of specific timing signals (e.g., T0, T1, P0, P1) used by the Encoder to activate different physical components at precise, fixed clock cycles.

85
New cards

Why is pipelining generally not efficient for instruction sets with very few bits (like 16 bits)?

A small instruction size compromises bit availability. Reserving necessary bits for complex pipelining mechanisms (like jump markers and prediction tags) would leave insufficient bits remaining to represent all required operations and register fields.

86
New cards

What is the key architectural insight derived from implementing solutions for pipeline hazards?

The key insight is that efficient pipelining requires dedicated additional hardware components and modifications to the existing architecture (like inter

87
New cards

Why is stalling (using NOPs) preferred over speculative fetching if the prediction is wrong?

If speculative fetching is wrong, the processor might change many actual processor registers, requiring a large overhead to flush them. Stalling avoids register tampering, making rectification simpler and faster.

88
New cards

What is the role of the multiplexer in the PC update mechanism?

The multiplexer selects the second input for the ALU. For PC update, the CU signals the multiplexer to select the constant value four, allowing the ALU to correctly calculate PC + 4, rather than taking a value from another register.

89
New cards

What is the Instruction Decoder's role in the Hardwired Control Unit, and what determines its size?

The Decoder takes the stripped opcode and translates it into a specific operation. Its size (e.g., 3

90
New cards

What is the essential function of the Register File (RF)?

The Register File is a collection of general

91
New cards

What does "byte addressable memory" mean?

Byte addressable memory means that each individual byte in the memory space has a unique, sequential address.

92
New cards

How are the Memory Address Register (MAR) and Memory Buffer Register (MBR) connected in a multiple bus system?

The MAR is connected to the dedicated Address Bus. The MBR is connected to the dedicated Data Bus.

93
New cards

What two actions occur simultaneously during the first part of the Instruction Fetch stage?

The two actions are: 1) Transferring the instruction address from the PC to the MAR, and 2) Incrementing the Program Counter (PC) by 4 (PC + 4) for the next sequential instruction.

94
New cards

When can two 'OUT' signals be activated in the same clock cycle in a single

bus architecture?

95
New cards

Define what an R

Type instruction is.

96
New cards

What is the goal of converting an expression from Infix to Postfix notation?

The goal is to restructure the expression based on operator precedence, ensuring that the highest priority operators are evaluated first, typically involving solving expressions within parentheses first.

97
New cards

What is the concept of Parallel Search in the context of Set

Associative Cache?

98
New cards

What happens if a Dynamic Branch Prediction is proven to be correct?

If the prediction is confirmed correct, the changes that were temporarily stored in the temporary registers (due to speculative execution) are then "stamped" (finalized) onto the actual designated processor registers.

99
New cards

What does the condition code 'Z' (Zero) indicate?

The 'Z' condition code register is a status flag associated with the ALU, which is set (True) if the result of the most recent arithmetic or logical operation was zero.

100
New cards

How does the Control Unit use the timing signal P1 to activate multiple components simultaneously?

The CU calculates activation signals such as P1 * R1 and P1 * MAR. If P1 is active (P1=1), both components R1 and MAR become active simultaneously within that clock cycle.