Architecture Basics Notes (CSCI 45)

John von Neumann was an absolute legend: mathematician, physicist, computer scientist, and engineer.
Died at 53 from cancer, likely due to radiation exposure from working on the Manhattan Project.
His architecture model is still the main one used today.
The five components of von Neumann architecture:
- The processing unit executes program instructions.
- The control unit drives program instruction execution on the processing unit. Together, the processing and control units make up the CPU.
- The memory unit stores program data and instructions.
- The input unit(s) load program data and instructions on the computer and initiate program execution.
- The output unit(s) store or receive program results.

A bus is a communication channel that transfers binary values between communication endpoints (e.g., CPU and memory).
Types of buses:
- Control bus: sends control signals that request or notify other units of actions.
- Address bus: sends the memory address of a read or write request to the memory unit.
- Data bus: transfers data between units.

Idea: memory closer to the CPU is faster but smaller/expensive; memory farther away is larger/cheaper but slower.
Core trend: faster access comes at a higher cost.
Basic idea of levels:
- Registers: closest to CPU; extremely fast; expensive; small quantity.
- Caches: between CPU and main memory; faster than main memory; smaller than main memory.
- Main Memory (RAM): larger than caches; slower than caches.
- Secondary Storage (e.g., SSD, HDD): much larger; much slower.
- Remote/Network Storage: slowest; accessed over networks.
A rough depiction of the memory hierarchy ordering by latency and capacity:
- Registers (on CPU) — lowest latency, smallest capacity.
- Caches (~GB) — low latency, small capacity.
- Main Memory (~TB range in scale, but typical PCs are several GB) — higher latency.
- Flash Disk / Traditional Disk — larger, much higher latency.
- Remote Secondary Storage (e.g., Internet) — highest latency, large capacity.
Primary storage vs secondary storage terminology:
- Primary storage (RAM, caches, registers) is fast and close to the CPU.
- Secondary storage (SSD/HDD) is non-volatile but slower.
Practical takeaway: as you move data farther from the CPU, the cost per byte falls, but latency to access data rises.

Question: Why not put everything in RAM?
Explanation: persistent files are stored in secondary storage (e.g., hard disk) and stay there when power is off.
Rule of thumb: farther storage from CPU is cheaper per bit, but data transfer to CPU is slower.
Implication: system designers use layers of storage and caching to hide latency.

RAM = main memory; also called primary storage.
It marks the end of the primary storage region, with everything after RAM considered secondary storage.
Programs you want to run are loaded from disk into RAM and then executed.
RAM serves as the workspace where active data and instructions reside during execution.

Locality of memory access:
- Temporal Locality: programs tend to access the same data repeatedly over time.
- Spatial Locality: programs tend to access data nearby recently accessed data.
Key takeaways:
- We gain speedups by storing commonly accessed memory closer to the CPU.
- We gain speedups by loading contiguous chunks of memory (not just single bytes) closer to the CPU.
Caches are the memory between the CPU and main memory.

A quick view of relative access times and capacities:
- Registers: access in ~1 cycle; tiny capacity; on-CPU storage.
- Caches: access in ~10 cycles; small capacity; on-CPU/Microarchitectural near memory.
- Main Memory (RAM): access in ~100 cycles; larger capacity.
- Primary Storage (e.g., Flash Disk): access in ~10^3 to ~10^5 cycles depending on technology.
- Secondary Storage (Disk): access in ~10^6 cycles; very large capacity.
- Remote Secondary Storage (e.g., Internet): access in even higher latency.
Visualization of progression from fast/cheap to slow/expensive in terms of latency and capacity.

RAM is also known as main memory.
It marks the boundary between fast, expensive storage near the CPU and slower, cheaper storage further away.
Programs are loaded from disk into RAM before execution.

A 32-bit processor uses 32-bit addresses and registers.
Address width and register width usually correlate with memory capacity.
Byte-addressed memory:
- With a 32-bit architecture, maximum addressable memory is $2^{32} ext{ bytes} = 4 ext{ GB}.$
- With 64-bit architectures, address space expands to $2^{64} ext{ bytes} = 16 ext{ EB}.$
These values illustrate why architectures evolved from 32-bit to 64-bit to accommodate more memory.

Memory maps addresses (binary) to byte values.
Memory access patterns for multi-byte data: specify starting address; CPU knows to take 4/8 bytes for a load like ldr.

Definition: An ISA is the encoding of instructions that the CPU understands; the language the CPU speaks.
An ISA also defines:
- Supported data types.
- The registers available.
- The hardware support for managing main memory, etc.
Practical use: view binary representations using tools (e.g., objdump -d prog) to see instruction encoding.

Memory is accessed via load/store instructions that operate on registers and memory addresses.
Key concepts:
- Memory addresses are specified (64-bit in the example context).
- The CPU loads/stores values from/to memory using addressing modes.

Memory maps addresses (64-bit numbers) to byte values (8-bit numbers).
To load multi-byte data, specify the starting address; the CPU reads the required number of bytes (e.g., 4 or 8) from that address.

Regular form with register:
- $ext{ldr } Xd, [Xn] \ <br />$ Xd = *Xn; $</li></ul></li> <li>Immediate offset:<ul> <li>$ ext{ldr } Xd, [Xn, #4] \
  $Xd = *(Xn + 4);$
Register offset:
- $ext{ldr } Xd, [Xn, Xm] \ <br />$ Xd = *(Xn + Xm); $</li></ul></li> <li>Offset with write-back (exclamation point):<ul> <li>$ ext{ldr } Xd, [Xn, #4]! \
  $Xd = *(Xn + 4); \ Xn += 4;$
Examples of sizes:
- $ext{ldr } Wd, [Xn] ext{ loads a 32-bit value}$
- $ext{ldr } Xd, [Xn] ext{ loads a 64-bit value}$

Load/store a byte (byte-wise operations) using W registers only (32-bit width constraint for byte operations on W).
Example: ASCII math to convert the word "csci" to all caps using arithmetic on ASCII codes.

Practical lab activities are scheduled (refer to the course outline) to reinforce the material.

Assigned Reading: Dive Into Systems, start Chapter 5; plan ~2 weeks to complete; it’s a chapter to read and not skim.
Useful ARM resources:
- General Instructions (e.g., mov, add, bl, etc.):
- $https://developer.arm.com/documentation/dui0801/l/A64-General-Instructions$
- Data Transfer Instructions (e.g., ldr, str, etc.):
- $https://developer.arm.com/documentation/dui0801/l/A64-Data-Transfer-Instructions$