Detailed Study Notes on MIPS Microarchitecture and Performance Analysis
7.1 INTRODUCTION
In this chapter:
Learn how to design a MIPS microprocessor with three different microarchitecture versions.
Understand the trade-offs between performance, cost, and complexity.
Microarchitecture is the arrangement of registers, ALUs, FSMs, and memory needed to implement a processor.
A specific architecture (like MIPS) can have multiple microarchitectures, each with distinct performance characteristics.
The chapter is heavily influenced by David Patterson and John Hennessy's MIPS designs, utilizing well-illustrated and simple commercial architectures.
7.1.1 Architectural State and Instruction Set
Computer architecture consists of:
Instruction set
Architectural state (includes the program counter and 32 registers).
MIPS microarchitecture must contain:
All architectural states to execute instructions and generate a new state.
Simplifying assumptions guide the focus of instructions involved:
R-type arithmetic/logic:
add,sub,and,or,slt.Memory instructions:
lw,sw.Branch instruction:
beq.
After mastering these instructions, further additions (like
addi,j) can be incorporated.
7.1.2 Design Process
Microarchitectures consist of:
A datapath that operates on data words.
A control unit that gives signals to execute instructions.
Design steps to build a complex MIPS processor:
Begin with hardware including state elements (program counter and registers).
Integrate combinational logic between state elements for computing new states.
Partition the memory system for instruction and data separation.
Figure 7.1 illustrates state elements with bus widths (e.g., 32-bit data and narrower address lines).
Synchronous sequential circuits: The processor state changes at clock edges (reset included).
7.1.3 MIPS Microarchitectures
Three microarchitectures for MIPS are designed:
Single-cycle Processor: Executes an entire instruction in one cycle.
Simplicity in control unit design.
Cycle time limited by the slowest instruction.
Multicycle Processor: Executes instructions in several shorter cycles.
Reduces hardware costs by reusing blocks.
Expensive blocks reused between different instruction steps.
Pipelined Processor: Similar to a single-cycle but allows execution of several instructions at once.
Adds complexity with dependency handling and non-architectural registers.
Performance techniques to enhance modern microprocessor speed.
7.2 PERFORMANCE ANALYSIS
Cost formulas depend on hardware needed and technology employed.
Performance metrics are commonly misrepresented in marketing (Intel vs. AMD analogy).
Real performance: measured by execution time on a program of interest.
CPU performance metrics:
Execution time:
ext{Execution Time} = rac{ ext{# Instructions} imes ext{CPI}}{ ext{Clock Rate}} = ext{CPI} imes ext{Clock Cycle Time}.
The number of instructions depends on architecture complexity and coding efficiency.
CPI (Clock Cycles per Instruction): Number of cycles to execute average instruction; varies with architecture.
Memory wait times are discussed in Chapter 8 affecting CPI.
7.3 SINGLE-CYCLE PROCESSOR
Construction starts with connecting the datapath elements as per specified instruction logic.
Single-Cycle Datapath:
New connections highlighted to differentiate new elements.
Fetching instruction:
PC connected to instruction memory address input; output instruction fetched.
7.3.1 Datapath Development
For lw instruction initiation:
Read base address from referenced register.
Sign-extend offset from instruction for memory operation.
Use ALU for computation and retrieve memory data.
Write memory data back to register file.
Increment PC for next instruction based on cycle operations.
7.3.2 Control Unit Design
Control signals are derived from instruction opcode and funct fields, interacting with datapath signals.
Control logic manages operations like ALU, register writes as instructions process.
7.4 MULTICYCLE PROCESSOR
A multicycle processor allows greater flexibility in cycle lengths by breaking instructions into manageable parts.
Enhancements:
Single adder reused for address calculations (pc increment) and other arithmetic tasks.
Shared memory architecture for instructions and data.
Control signal dependencies shift with new instructions.
7.5 PIPELINED PROCESSOR
Pipelining divides operations into discrete stages enabling concurrent instruction execution, enhancing throughput.
Performance impacted by hazards needing management of data, control, and resource availability.
Analysis of instruction flow through stages highlights performance improvements.
Example Scenarios in Instruction Throughput
Static vs. Dynamic Techniques for handling branches, forwarding, and stall implementations are adjusted according to pipelining characteristics.
7.9 REAL-WORLD PERSPECTIVE: IA-32 MICROARCHITECTURE
Historic evolution described from initial microprocessors to advanced IA-32 architecture by Intel.
Modern advancements have transitioned towards multi-core designs to balance performance and power consumption.
7.10 SUMMARY
Comparison of Microarchitectures: Summarizes the design processes, articulation of principles from earlier chapters, cogent reflections on integration and performance outcomes.
Advances in manufacturing and design offer insights into the future of processor architecture.