Reduced Instruction Set Computers (RISC)
CISC vs. RISC Architectures
CISC (Complex Instruction Set Computer)
- Characterized by a large and complex set of instructions.
- Examples include IBM 370/168, VAX 11/780, and Intel 80486.
- Aimed to simplify compilers and improve performance.
- Advantages of smaller programs:
- Less memory usage.
- Improved performance due to fewer instruction bytes fetched.
- Reduced page faults in a paging environment.
- More instructions fit in cache.
RISC (Reduced Instruction Set Computer)
- Examples include SPARC and MIPS R4000.
- Focuses on simplifying the instruction set for better performance.
- Key characteristics:
- One machine instruction per machine cycle.
- Simple LOAD and STORE operations for memory access.
- Register-to-register operations.
- Simple addressing modes.
- Fixed instruction length aligned on word boundaries.
- Simple instruction formats.
RISC Architecture Implications
- Optimizing compilers can be more effective with primitive instructions.
- Easier to move functions out of loops, reorganize code, and maximize register utilization.
- Possible to compute parts of complex instructions at compile time.
- Control units built for simple instructions can execute them faster.
- Instruction pipelining is more effective.
- More responsive to interrupts because they are checked between elementary operations.
Instruction Execution Characteristics
- Operations performed determine processor functions and memory interaction.
- Operand types and frequency influence memory organization and addressing modes.
- Execution sequencing affects control and pipeline organization.
- Emphasis is placed on optimizing the performance of time-consuming HLL features.
Weighted Relative Dynamic Frequency of HLL Operations
| Operation | Pascal | C |
|---|
| ASSIGN | 45% | 38% |
| LOOP | 5% | 3% |
| CALL | 15% | 12% |
| IF | 29% | 43% |
| GOTO | – | 3% |
| OTHER | 6% | 1% |
Dynamic Percentage of Operands
| Operand | Pascal | C | Average |
|---|
| Integer constant | 16% | 23% | 20% |
| Scalar variable | 58% | 53% | 55% |
| Array/Structure | 26% | 24% | 25% |
Procedure Arguments and Local Scalar Variables
- Small Nonnumeric Programs (Compiler, Interpreter, and Typesetter)
- Greater than 3 arguments: 0-7% (Pascal), 0-5% (C)
- Greater than 5 arguments: 0-3% (Pascal), 0% (C)
- Greater than 8 words: 1-20% (Pascal), 0-6% (C)
- Greater than 12 words: 1-6% (Pascal), 0-3% (C)
Large Register File
- Software Solution: Compiler allocates registers based on most used variables.
- Hardware Solution: More registers to hold more variables.
Register Windows
- Overlapping register windows for parameter passing and local storage.
- Circular buffer organization of overlapped windows.
Global Variables
- Global variables can be assigned memory locations by the compiler.
- Alternative: Use global registers fixed in number and available to all procedures.
Large Register File vs. Cache
| Characteristic | Large Register File | Cache |
|---|
| All local scalars | Recently-used local scalars | |
| Individual variables | Blocks of memory | |
| Compiler-assigned global variables | Recently used global variables | |
| Save/Restore based on depth | Save/Restore based on replacement alg. | |
| Register addressing | Memory addressing | |
| Multiple operands per cycle | One operand per cycle | |
Code Size Relative to RISC I
| RISC I | VAX-11/780 | M68000 | Z8002 | PDP-11/70 |
|---|
| 11 C Programs | 1.0 | 0.8 | 0.9 | 1.2 | 0.9 |
| 12 C Programs | 1.0 | 0.67 | 0.9 | 1.12 | 0.71 |
| 5 C Programs | 1.0 | | | | |
Pipelining
- Pipelining overlaps instruction execution to improve performance.
- Optimization techniques:
- Delayed Branch: Branch doesn't take effect until after the following instruction.
- Delayed Load: Target register is locked until load completes; useful work can be done while loading.
- Loop Unrolling: Replicates the loop body to reduce overhead and increase parallelism.
Normal vs. Delayed Branch
| Address | Normal Branch | Delayed Branch | Optimized Delayed Branch |
|---|
| 100 | LOAD X, rA | LOAD X, rA | LOAD X, rA |
| 101 | ADD 1, rA | ADD 1, rA | JUMP 105 |
| 102 | JUMP 105 | JUMP 106 | ADD 1, rA |
| 103 | ADD rA, rB | NOOP | ADD rA, rB |
| 104 | SUB rC, rB | ADD rA, rB | SUB rC, rB |
| 105 | STORE rA, Z | SUB rC, rB | STORE rA, Z |
| 106 | | STORE rA, Z | |
MIPS R4000
- RISC chip set developed by MIPS Technologies Inc.
- Uses 64 bits for data paths, addresses, registers, and the ALU.
- Partitioned into CPU and coprocessor for memory management.
- Supports thirty-two 64-bit registers.
- Provides up to 128 KB of high-speed cache.
- I-type (Immediate): Operation, rs, rt, Immediate.
- J-type (Jump): Operation, Target.
- R-type (Register): Operation, rs, rt, rd, Shift, Function.
- Superscalar: Replicates pipeline stages for parallel instruction processing.
- Super-pipelined: Uses more fine-grained pipeline stages for increased parallelism.
R3000 Pipeline Stages
- IF (Instruction Fetch): Translate virtual address to physical address using TLB, send physical address to instruction cache.
- RD (Read): Return instruction from cache, decode instruction, read register file, calculate branch target address.
- ALU: Perform arithmetic/logical operation, decide branch, calculate data virtual address, translate data virtual address to physical.
- MEM (Memory): Send physical address to data cache.
- WB (Write Back): Write result to register file.
R4000 Pipeline Stages
- Instruction fetch (two halves).
- Register file (decode, tag check, operand fetch).
- Tag check (cache tag checks).
- Instruction execute (ALU operation, address calculation, branch operation).
- Data cache (two halves).
- Write back.
SPARC (Scalable Processor Architecture)
- Architecture defined by Sun Microsystems.
- Inspired by Berkeley RISC 1.
SPARC Register Window Layout
- Organized into register windows forming a circular stack.
SPARC Addressing Modes
| Instruction Type | Addressing Mode | Algorithm | SPARC Equivalent |
|---|
| Register-to-register | Immediate | operand = A | S2 |
| Load/store | Direct | EA = A | R0 + S2 |
| Register-to-register | Register | EA = R | RS1, SS2 |
| Load/store | Register Indirect | EA = (R) | RS1 + 0 |
| Load/store | Displacement | EA = (R) + A | RS1 + S2 |
Processor Organization for Pipelining
- Enhancements:
- Multiple reservation stations.
- Forwarding.
- Reorder buffer.
- Instruction dispatching:
- Issue from ID to reservation station.
- Dispatch from reservation station to FU.
- Data forwarding addresses read-after-write (RAW) delays.
- Reorder buffer supports out-of-order execution (OoOE).