The CPU consists of:
ALU (Arithmetic Logic Unit)
Control Unit
Registers
Cache Memory
The ALU is responsible for all mathematical operations in the CPU.
Input: Receives inputs from registers, control signals, and instructions from the control unit.
Calculation: Executes arithmetic (addition, subtraction, multiplication, division) or logical operations (bitwise AND, OR, XOR, NOT, and comparisons) using logic gates.
Output: Stores the result back in registers or uses it for branching (e.g., comparison results).
Status Flags: Generates status flags (carry, overflow, zero, negative) to indicate the outcome of the operation, used by the control unit for program control.
The Control Unit (CU) manages data flow and instruction decoding. Its responsibilities include:
Fetch and Decode Instructions: Retrieves instructions from memory and decodes them.
Generate Control Signals: Translates decoded instructions into control signals for other CPU components.
Direct Data Flow: Manages data flow between the CPU, memory, and I/O devices.
Coordinate Execution: Ensures CPU components work together correctly to execute instructions.
Manage Resources: Manages the allocation and use of CPU resources like registers and memory.
Timing and Sequencing: Provides timing control for instruction execution.
The Control Unit doesn’t perform operations itself (that’s the ALU’s job), but it dictates what, when, and how each part should operate.
Registers are the tiniest, fastest memory in the computer, located closest to the ALU. They temporarily hold data the CPU is working with, such as numbers, memory addresses, or instructions.
Different types of registers include:
Data Registers: Hold numbers the CPU is calculating.
Address Registers: Hold memory addresses.
Instruction Register (IR): Holds the current instruction the CPU is working on.
Program Counter (PC): Keeps track of the next instruction in memory.
Status Register: Holds flags (e.g., negative result, overflow).
Cache memory exists on the CPU itself and is used for reading and writing data. The CPU first checks the cache for data. If found (cache hit), it’s accessed quickly. If not found (cache miss), the data is fetched from main memory and stored in the cache for future use.
Check Cache: Determine if data is already present (Cache Hit/Cache Miss).
Cache Hit: Use the data immediately (very fast).
Cache Miss: Fetch from RAM, potentially store a copy in the cache (Update cache).
Allocation Policy: Cache on Read: Bring data into cache when you read it.
Write Data: Update cache (and maybe RAM).
Allocation Policy: Cache on Write: Bring data into cache when you write it (depends on the CPU).
Write-Through: Data is written to cache and RAM simultaneously (simple but slower).
Write-Back: Data is written only to cache initially; RAM is updated later when the cache line is replaced (faster, but riskier if there’s a crash).
Manage Space: Remove old data if needed.
LRU (Least Recently Used): Remove the data that hasn’t been used for the longest time.
FIFO (First-In First-Out): Remove the oldest data first.
Random: Pick any block randomly (simple but not always smart).
Feature | L1 Cache | L2 Cache | L3 Cache |
---|---|---|---|
Location | Integrated directly into the processor core. | Still on the CPU chip, but slightly farther from the core compared to L1. | On the CPU chip but shared across all cores. |
Speed | Extremely fast—same speed as the CPU core. | Slower than L1, but still faster than main memory (RAM) | Slower than L1 and L2, but still much faster than RAM. |
Size | Very small (typically 16KB–128KB). | Medium size (typically 128KB–1MB per core). | Large (2MB–64MB depending on CPU model). |
Purpose | Stores very frequently accessed data. | Acts as a backup to L1; holds data needed often. | Reduces access to RAM by storing less frequently accessed data for all cores. |
Type | L1d (data) and L1i (instructions). | Private to each core for Intel architecture | Usually shared cache among cores. |
Latency | Very low (typically 1–4 CPU cycles). | Higher than L1 (around 10–20 cycles). | Higher than L2 (around 30–50 cycles). |
The CPU needs to link up all the main components (ALU, Control Unit, Registers, Cache, Memory Interface) to transfer data, instructions, and control signals.
Buses tie everything together, providing separate, clear, fast roads for information travel between components.
Data Bus: Moves data between CPU, memory, and I/O devices.
8-bit, 16-bit, 32-bit, 64-bit buses — wider = more data per transfer.
Address Bus: Sends the location (address) where data is stored or needs to go.
The width of the address bus determines the maximum addressable memory (e.g., 32-bit = ~4GB, 64-bit = ~18EB).
Control Bus: Sends timing and control signals (e.g., Read/Write signals, Clock signals, Interrupts, Reset commands).
External expansion buses connect the CPU with external devices outside of the CPU and on the motherboard.
Components include memory, DMI, CPU-Connected PCIe.
Memory: The memory bus connects RAM to the CPU with data, address, and control lines. Modern systems use integrated memory controllers in the CPU for direct RAM access.
DMI: The CPU connects to the PCH (Platform Controller Hub) for peripherals via the Direct Media Interface (DMI) bus (UMI for AMD).
CPU-Connected PCIe: Modern CPUs have 1 x PCIe x16 slot for graphics cards (GPUs) and 1 x PCIe x4 for high-performance SSDs (NVMe).
The Platform Controller Hub (PCH)—Intel’s modern chipset (formerly southbridge)—acts as the central hub for I/O and peripheral connectivity, interfacing with a variety of buses.
DMI: The PCH connects back to the CPU. The current DMI 4.0 standard uses a proprietary dedicated PCIe x8 connection (8 lanes) with a data rate of 16 GB/s per lane. The theoretical maximum is 128 GB/s. Bottlenecks are possible if combined bandwidth requirements exceed the DMI's capacity.
PCH-Connected PCIe: For lower-bandwidth expansion cards like Wi-Fi, sound, and secondary NVMe SSDs. Devices use PCIe x4 / x1. PCH-attached PCIe has higher latency and is constrained by DMI bandwidth.
USB Bus: Manages USB 2.0, 3.0, 3.2 Gen 1/2, and sometimes USB4 ports.
SATA Bus: Connects to SSDs, HDDs, and optical drives.
Legacy devices: Eg BIOS, TPM- The BIOS Trusted Platform Module (TPM) controls a security component of the motherboard.
PCIe is the current high-speed interface standard used to connect components like graphics cards, SSDs, network cards, and other peripherals to a computer’s motherboard.
It's used for attaching peripheral devices to the motherboard internally either as an integrated circuit or via an expansion card.
PCIe supports multiple lane configurations where more lanes = more bandwidth.
PCIe 5.0 standard has: ∼4 GB/s per lane
PCIe 6.0 (newer): ∼8 GB/s per lane, using PAM4 signaling
Typical Uses:
Graphics Cards (GPUs) – usually uses x16 slots.
NVMe SSDs – commonly use x4 lanes for fast storage.
Wi-Fi cards, RAID controllers, capture cards – typically use x1 or x4 slots.
USB is connected to the PCH (Platform Controller Hub).
Specification for communication between devices and a host controller (e.g. PC)
Peripherals – mice, keyboards, cameras, printers, flash drives, network adapters, and external hard drives
Also used on smartphones for data transfer and charging
Supports plug-and-play and hot-plugging (ability to add/remove devices while the computer is running).
SATA is connected to the PCH (Platform Controller Hub) used for connecting mass storage devices (e.g. SSD, HDD, optical drives).
Slower than NVMe SSDs
Cables are thinner and cheaper (less wires)
Data Transfer 1.5–6Gbps, SATA revision 3.2 (16 Gbit/s)
One drive per link
Hot swapping is supported
CPU word size refers to the number of bits the CPU can process or transmit in one operation.
Key Aspects of CPU Word Size:
Bit Width: Common word sizes are 8-bit, 16-bit, 32-bit, and 64-bit.
Registers: The CPU's registers typically match the word size
Memory Addressing: A 32-bit CPU can address up to 4 GB (2^{32} bytes) of memory. A 64-bit CPU can theoretically address up to 16 EB exabytes (2^{64} bytes).
Data Bus Width: It influences the width of the data bus.
For modern CPUs which are 64-bit, the real benefits are visible when running 64-bit operating systems and 64-bit application software.
Advantage of a larger CPU word size:
Larger Addressable Memory Space: A 32-bit CPU can address up to 4 GB (2^{32}), while a 64-bit CPU can theoretically address 18.4 million TB (2^{64}).
Performance: Larger word sizes allow more data to be processed per cycle.
Compatibility: Backward compatible with 32 bit applications.
Efficiency: Tasks involving large numbers or large memory spaces benefit from wider word sizes.
Increased Memory Usage: Pointers and addresses are larger (e.g., 64-bit vs 32-bit), which increases the size of data structures. Programs may consume more RAM.
More Power Consumption: Larger word size CPUs have wider data paths and bigger registers, leading to higher power usage.
Larger Binary Sizes: Compiled programs for 64-bit CPUs tend to be larger.
Backward Compatibility Issues: Older 32-bit software may not run natively on a 64-bit CPU without emulation.
Cost: Larger word size CPUs often require more silicon for manufacturing (higher cost).
Not Always Necessary: If an application doesn't need large memory addressing or wide data processing, a 64-bit CPU offers no real benefit.
Clock speed is the timing signal to set the pace of instruction execution. One clock period is one cycle.
Unit of measurement is Hertz (Hz)
1 Hz = 1 cycle per second
1 MHz = 1 million cycles per second (Megahertz)
1 GHz = 1 billion cycles per second (Gigahertz)
One cycle is the smallest unit of time to a processor
Processor execution timing is measured in cycles
Each instruction may take one or more cycles to process
> 1 instruction per clock cycle (superscalar processor)
Faster clock speed = faster processing!
Modern CPUs have a clock speed measured in GigaHertz (GHz) and typically operate in the 3-5 GHz range
1 GHz = 1 billion cycles per second (Gigahertz)
Types:
Base Clock Speed: Default frequency (e.g., 3.2 GHz)
Boost/Turbo Frequency: Maximum speed under load (e.g., 5.5 GHz)
Some desktop CPUs can be ‘overclocked’ which allows the CPU to run faster than the rated speed.
These CPUs will have unlocked multiplier. (e.g., Intel "K" series, AMD "X" series)
The CPU multiplier determines the CPU’s speed by multiplying the base clock speed (BCLK).
For example, a BCLK of 100MHz multiplied by a CPU multiplier of 45 would result in a CPU speed of 4.5GHz.
Unit | Symbol | Power of 10 | Multiplier |
---|---|---|---|
Kilo | K | 10^3 | x 1,000 |
Mega | M | 10^6 | x 1,000,000 |
Giga | G | 10^9 | x 1,000,000,000 |
Tera | T | 10^{12} | x 1,000,000,000,000 |
Peta | P | 10^{15} | x 1,000,000,000,000 |
Exa | E | 10^{18} | x 1,000,000,000,000 |
Zetta | Z | 10^{21} | x 1,000,000,000,000 |
*note – for memory calculation, we use 2^{10} (1024) as the prefix multiplier. |
Refers to the number of physical cores in the CPU. More cores = better multitasking and parallel processing.
Each core can fetch, decode, execute, and retire its own stream of instructions.
Homogeneous cores: All cores are identical (e.g., AMD Ryzen 7 CPUs using AMD Zen 4 cores)
Heterogeneous cores: Mix of “Performance” (P-cores) and “Efficiency” (E-cores) (e.g., Intel Alder Lake / Raptor Lake - 6 P-cores + 8 E-cores = 14 cores).
CPUs are categorized into two major types of processor architectures: RISC (Reduced Instruction Set Computing) and CISC (Complex Instruction Set Computing).
Feature | CISC | RISC |
---|---|---|
Instruction Set | Large, complex, variable length instructions | Small, simple, fixed-length instructions |
Execution | Multiple clock cycles per instruction | Typically one clock cycle per instruction |
Complexity | More complex hardware, more instructions | Simpler hardware, fewer instructions |
Speed | Can execute complex operations with fewer lines of code | Generally faster execution of individual instructions |
Applications | Desktop PCs, servers, workstations | Mobile devices, embedded systems, power-efficient devices |
Examples | Intel / AMD x86/x86-64 cpus | Apple M1 / M2 / M3 / M4 cpus |
Feature | x86-64 | ARMv9 |
---|---|---|
Architecture Type | CISC (Complex Instruction Set Computing) | RISC (Reduced Instruction Set Computing) |
Instruction Set | Variable-length (1–15 bytes) | Fixed-length (typically 4 bytes) |
Power Efficiency | Higher power consumption | Optimized for power efficiency |
Performance Per Watt | Lower efficiency | Higher efficiency |
Registers | General-purpose (16 registers) | General-purpose (31 registers) |
Virtualization | Support: Intel VT-x, AMD-V | ARM Virtualization Extensions (EL2) |
Security Features | Intel CET, AMD SME/SEV | Memory Tagging Extension (MTE), Confidential Compute Architecture (CCA) |
Common Use Cases | Desktops, laptops, servers | Mobile devices, embedded systems, cloud computing, AI workloads |
Implementations | Intel Core, AMD Ryzen, Intel Xeon, AMD EPYC | Apple M-series, Qualcomm Snapdragon, NVIDIA Grace, AWS Graviton |
Core: A dual-core (or quad-core) processor integrates 2 (or 4) computational cores within a single central processor unit
Thread: Viewed by the OS as multiple logical CPUs, which means 4 threads can run 4 processes concurrently.
Bus Speed: A circuit that connects one part of the motherboard to another.
Core Speed: Clock speed of a processor's individual core, measured in cycles per second (Hz).
Cache: Stores program instructions or data on the CPU that are frequently referenced by software during operation.
Instruction set: Group of commands for a CPU in machine language.
At the hardware timing level of the CPU to understand execution at its most atomic level requires defining timing units like clock, machine, and instruction cycles.
A Clock Cycle is the smallest unit of time in a CPU, defined by the CPU clock.
It represents one oscillation of the clock signal. (i.e. 1 Hz)
The clock speed is measured in Hz, typically either megahertz (MHz) or gigahertz (GHz).
Multiple clock cycles are needed to execute a basic operation in a single-threaded CPU.
Modern CPUs can execute multiple operations per clock cycle due to pipelining, superscalar execution, out-of-order execution, multithreading, multiple execution units, and speculative execution.
The machine cycle is the smallest unit of time to do any actual work by the CPU. It involves operations at the hardware level. One machine cycle involves the 4-step Machine Cycle, by which a CPU access memory or an I/O device and complete one basic operation. The following steps make up ONE machine cycle.
Fetch – The instruction is fetched from memory (RAM or Cache) into the instruction register (IR)
Decode – The instruction is translated into control signals for execution. Uses value in IR to determine what operation ALU should perform
Execute – The operation is performed by ALU (such as arithmetic or logic operations).
Store (or Write-Back) – The result of execution is written back to a register or memory.
The 3 stage Instruction Cycle (also called the Fetch-Decode-Execute Cycle) represents the time required to fetch, decode, and execute a complete instruction from a program. It consists of one or more machine cycles.
Fetch – The processor retrieves the instruction from memory.
Decode – The processor interprets the instruction to understand what action is required.
Execute – The processor carries out the operation.
Each step in the Instruction Cycle requires one or more Machine Cycles to complete. Each Machine Cycle consists of multiple Clock Cycles (depending on CPU efficiency).
Term | Definition | Relationship |
---|---|---|
Clock Cycle | Smallest unit of time in a CPU, one tick of the clock. | Multiple clock cycles form a machine cycle. |
Machine Cycle | Time to complete a basic operation (fetch, decode, execute, store.). | Represents a single operation, like a memory read or write, within the CPU. |
Instruction Cycle | Time to complete a full CPU instruction (fetch, decode, execute). | A series of machine cycles required to fully execute a single instruction from a program. |
Program: A software program sits on disk, waiting for it to be executed.
Process: When the program is executed, the OS runs an instance of it, loads it into memory, and creates a process. Multiple processes can run the same program independently (e.g., two open Word documents).
Thread: A thread is the unit of execution inside a process. Threads share memory and resources of the process, but each has its own execution context.
Processor Instruction Cycle (Fetch-Decode-Execute):
When a program (e.g. Powerpoint) is loaded into memory and executed by the processor, it becomes a process. A process is basically a program in execution and a processor executes one process at a time and it consists of one or more threads.
Each process has its own memory address space. One process cannot corrupt the memory space of another process.
New / Ready: The process is willing to run, but there is no CPU temporarily available for it.
Running: The process is actively executing instructions on a processor. The operating system's scheduler manages which process is currently running
Waiting: The process is temporarily paused, awaiting a specific event or resource to become available.
Terminated: The process has completed execution or has been terminated by the operating system.
A thread is the smallest unit of execution that an operating system can schedule and allocate resources to. The operating system allocates processor time and manages the thread with a scheduler. A thread is a unit of execution within a process, sharing resources with other threads within the same process.
New (Created): The thread is created but not yet started.It is waiting to be scheduled by the OS.
Runnable (Ready): The thread is ready to run and waiting for CPU time. It may be in a queue waiting to be scheduled.
Running: The thread is currently being executed on a CPU core.
Blocked / Waiting / Sleeping: The thread is waiting for some resource (e.g., I/O, lock, or signal). It's not ready to run until the condition is resolved.
Terminated (Dead): The thread has finished execution or has been forcefully killed. Its resources are reclaimed by the OS.
A process with four threads of execution, running on a single processor and a process with four threads of execution, running on a multi-core or multi-processor allows the Operating System’s scheduler to take advantage of the multiple cores and run multiple threads.
Modern Windows Operating System (OS) includes both a process manager and a thread manager as part of its kernel-level components. Each thread is scheduled independently by the kernel’s thread-level scheduler.
Process Manager: Creates Processes, which also in turn creates 1 or more thread per process. CPU Scheduler sends thread to the specific CPU core for execution
Thread Manager: Creates threads from the process and manages the lifecycle , decides which thread should run next using various scheduling algorithms.
Process manager - manages each process by describing the state and resource ownership of the process.
Creates and manages processes when a program is launched, and also terminates processes
Allocates resources (memory, CPU time, I/O) to each process.
Maintains a process table with each process’s metadata (PID, status, etc.).
Hands off control to the thread manager to create and manage the process’s threads.
Thread manager - Creating and managing threads within processes. Is part of the kernel
Creating, terminating, suspending, and resuming threads
Assigning thread IDs and managing stacks
Managing thread states (Ready, Running, Waiting, etc.).
Thread scheduler - decides which thread to run next on a CPU by implementing priority-based preemptive scheduling. It ensures that whenever the CPU remains idle, the OS has at least selected one of the processes available in the ready-to-use line. USes thread priorities (0–31) to determine execution order. Handles thread quantum (time slice) and does context switching between running threads, thus excessive waiting for processes is minimized.
The scheduler assigns a priority number to every process or thread (lower number = higher priority)
The scheduler picks the highest-priority task from the ready queue
Preemption determines whether a process can be interrupted.
Algorithm | Description |
---|---|
Priority Scheduling | Assigns priorities to processes and executes higher priority processes first |
Round-Robin Scheduling | Allocates a fixed time slice to each process before moving to the next |
FCFS (First-Come, First-Served) | Schedules processes in the order they arrive |
SJN (Shortest Job Next) | Prioritizes processes with the shortest execution time. Long Job may be starved |
When switching between multiple running processes, due to higher priority processes interrupting currently running processes, context switching occurs.
Process context switching is the act of saving the state of a currently running process and restoring the state of another process so that the CPU can switch from executing one process to another. The context of a process includes all the data the CPU needs to resume that process later.
a) CPU registers (like Program Counter, Stack Pointer)
b) Process state
c) Memory mappings
d) Open files, etc.
A deadlock is a situation where two or more processes or threads are blocked indefinitely, waiting for each other to release a resource that they need to proceed.
Prevention methods:
Resource ordering: Enforce a specific order in which processes request resources, preventing circular wait.
No preemption: Make resources preemption-less, so processes cannot be forcibly interrupted.
Recovery methods:
Process termination: Terminate one or more processes involved in the deadlock to break the cycle.
Resource rollback: Roll back transactions or undo changes made by processes involved in the deadlock.
When two or more processes (or threads) access shared data or resources (e.g., files, memory, printers), ensure that only one process accesses a critical section at a time, ensuring data consistency and preventing issues like race conditions, deadlocks, or data corruption.
Synchronization helps to avoid conflicts by ensuring correct sequencing of operations and maintaining data integrity.
As multiple processes concurrently run on the multi-core CPUs, sometimes, these processes need to communicate with each other.
The following are methods process use to communicate with each other:
Pipes (named and unnamed): Simple data exchange between processes using special files in the OS file system
Message Queues: allows buffered, structured asynchronous messages between sender and receiver.
Shared Memory (with synchronization): Fastest IPC mechanism where processes share a common memory region
Sockets: Network method which allows communication between processes on the same or different machines.
Signals: Used for simple notifications to notify a process that an event like interrupting or terminating has occurred.
Semaphores: a counter method used as a synchronization tool, which allows only a maximum number of processes to access a resource concurrently.
Here are the key points from the provided notes:
CPU Architecture: The CPU consists of the ALU, Control Unit, Registers, and Cache Memory. The ALU performs calculations, the Control Unit manages data flow and instruction decoding, registers are the fastest memory holding data, and cache memory is used for reading and writing data.
Interconnection within a Processor: Components are linked via buses, including the Data Bus, Address Bus, and Control Bus.
External Expansion Buses: Connect the CPU with external devices through Memory, DMI, and CPU-Connected PCIe.
PCH (Platform Controller Hub): Acts as a central hub for I/O and peripheral connectivity.
PCIe (Peripheral Component Interconnect Express): A high-speed interface for connecting components like GPUs and SSDs.
USB (Universal Serial Bus): Used for communication between devices and a host controller, supporting plug-and-play and hot-plugging.
SATA (Serial Advanced Technology Attachment): Connects mass storage devices and supports hot swapping.
CPU Word Size: Refers to the number of bits the CPU can process in one operation (32-bit or 64-bit), influencing memory addressing and performance.
Clock Speed: The timing signal for instruction execution, measured in Hertz (Hz), with modern CPUs operating in the GHz range.
Core Count: The number of physical cores in the CPU, improving multitasking and parallel processing.
Instruction Set Architecture: Categorized into RISC (Reduced Instruction Set Computing) and CISC (Complex Instruction Set Computing).
CPU Instruction Cycle: Includes the Processor Clock Cycle, Machine Cycle (Fetch, Decode, Execute, Store), and Instruction Cycle (Fetch, Decode, Execute).
Process Management: Programs become processes when executed, consisting of one or more threads. Processes have their own memory address space, while threads share resources.
Process/Thread State Lifecycle: Includes states like New/Ready, Running, Waiting, and Terminated.
OS Scheduling: The OS uses process and thread managers, with a thread scheduler that uses priority-based preemptive scheduling.
Context Switching: Occurs when switching between multiple running processes, saving and restoring the state of processes.
Managing Deadlocks: Methods include resource ordering and process termination to prevent or recover from deadlocks.
Synchronization: Ensures that only one process accesses a critical section at a time to maintain data consistency.
Inter-Process Communication (IPC): Methods processes use to communicate include pipes, message queues, shared memory, sockets, signals, and semaphores