Reduced Instruction Set Computers (RISC)

CISC vs. RISC Architectures

Characterized by a large and complex set of instructions.
Examples include IBM 370/168, VAX 11/780, and Intel 80486.
Aimed to simplify compilers and improve performance.
Advantages of smaller programs:
- Less memory usage.
- Improved performance due to fewer instruction bytes fetched.
- Reduced page faults in a paging environment.
- More instructions fit in cache.

Optimizing compilers can be more effective with primitive instructions.
Easier to move functions out of loops, reorganize code, and maximize register utilization.
Possible to compute parts of complex instructions at compile time.
Control units built for simple instructions can execute them faster.
Instruction pipelining is more effective.
More responsive to interrupts because they are checked between elementary operations.

Operations performed determine processor functions and memory interaction.
Operand types and frequency influence memory organization and addressing modes.
Execution sequencing affects control and pipeline organization.
Emphasis is placed on optimizing the performance of time-consuming HLL features.

Operand	Pascal	C	Average
Integer constant	16%	23%	20%
Scalar variable	58%	53%	55%
Array/Structure	26%	24%	25%

Small Nonnumeric Programs (Compiler, Interpreter, and Typesetter)
- Greater than 3 arguments: 0-7% (Pascal), 0-5% (C)
- Greater than 5 arguments: 0-3% (Pascal), 0% (C)
- Greater than 8 words: 1-20% (Pascal), 0-6% (C)
- Greater than 12 words: 1-6% (Pascal), 0-3% (C)

Software Solution: Compiler allocates registers based on most used variables.
Hardware Solution: More registers to hold more variables.

Global variables can be assigned memory locations by the compiler.
Alternative: Use global registers fixed in number and available to all procedures.

Characteristic	Large Register File	Cache
All local scalars	Recently-used local scalars
Individual variables	Blocks of memory
Compiler-assigned global variables	Recently used global variables
Save/Restore based on depth	Save/Restore based on replacement alg.
Register addressing	Memory addressing
Multiple operands per cycle	One operand per cycle

	RISC I	VAX-11/780	M68000	Z8002	PDP-11/70
11 C Programs	1.0	0.8	0.9	1.2	0.9
12 C Programs	1.0	0.67	0.9	1.12	0.71
5 C Programs	1.0

Pipelining overlaps instruction execution to improve performance.
Optimization techniques:
- Delayed Branch: Branch doesn't take effect until after the following instruction.
- Delayed Load: Target register is locked until load completes; useful work can be done while loading.
- Loop Unrolling: Replicates the loop body to reduce overhead and increase parallelism.

Address	Normal Branch	Delayed Branch	Optimized Delayed Branch
100	LOAD X, rA	LOAD X, rA	LOAD X, rA
101	ADD 1, rA	ADD 1, rA	JUMP 105
102	JUMP 105	JUMP 106	ADD 1, rA
103	ADD rA, rB	NOOP	ADD rA, rB
104	SUB rC, rB	ADD rA, rB	SUB rC, rB
105	STORE rA, Z	SUB rC, rB	STORE rA, Z
106		STORE rA, Z

Superscalar: Replicates pipeline stages for parallel instruction processing.
Super-pipelined: Uses more fine-grained pipeline stages for increased parallelism.

IF (Instruction Fetch): Translate virtual address to physical address using TLB, send physical address to instruction cache.
RD (Read): Return instruction from cache, decode instruction, read register file, calculate branch target address.
ALU: Perform arithmetic/logical operation, decide branch, calculate data virtual address, translate data virtual address to physical.
MEM (Memory): Send physical address to data cache.
WB (Write Back): Write result to register file.

Instruction Type	Addressing Mode	Algorithm	SPARC Equivalent
Register-to-register	Immediate	operand = A	S2
Load/store	Direct	EA = A	R0 + S2
Register-to-register	Register	EA = R	RS1, SS2
Load/store	Register Indirect	EA = (R)	RS1 + 0
Load/store	Displacement	EA = (R) + A	RS1 + S2

Enhancements:
- Multiple reservation stations.
- Forwarding.
- Reorder buffer.
Instruction dispatching:
- Issue from ID to reservation station.
- Dispatch from reservation station to FU.
Data forwarding addresses read-after-write (RAW) delays.
Reorder buffer supports out-of-order execution (OoOE).