Chapter 8: Main Memory and Memory Management Systems

Memory Management Background and Hardware Structure

Program Execution Requirements: For a program to be executed, it must be brought from the disk into the system's main memory and placed within a process.
Direct CPU Access: The main memory and registers are the only storage units that the CPU can access directly. If data is not in these locations, it must be moved there before the CPU can operate on it.
Memory Unit Operation: The memory unit functions only by seeing a stream of addresses along with read requests, or addresses and data accompanied by write requests.
Access Speeds: * Registers: Built into the CPU; access occurs in $1$ CPU clock cycle or less. * Main Memory: Accessed via a memory bus; can take many cycles to complete, which may cause a CPU stall. * Cache: Sits between main memory and CPU registers to bridge the speed gap and accelerate access.
Memory Protection: Essential for ensuring correct system operation. Each process must have its own separate memory space.
Base and Limit Registers: Used to define the logical address space and determine legal address ranges for a process. * Base Register: Holds the smallest legal physical memory address. * Limit Register: Specifies the size of the range (the length of the memory segment). * Hardware Check: In user mode, the CPU must check every memory access to ensure it falls between the base and limit values ( $\text{base} \times \text{address} < \text{base} + \text{limit}$ ). Any violation results in a trap to the operating system (monitor-addressing error).

Address Binding and Mathematical Mappings

Input Queue: Programs residing on disk waiting to be brought into memory for execution.
Binding Stages: Addresses represented in source code are typically symbolic. They are bound to different formats at various stages: * Compile Time: If the memory location is known a priori, absolute code is generated. If the starting location changes, the code must be recompiled. * Load Time: If the memory location is unknown at compile time, the compiler must generate relocatable code. Final binding is delayed until the program is loaded. * Execution Time: If a process can be moved during execution from one memory segment to another, binding is delayed until runtime. This requires hardware support, such as base and limit registers.
Address Space Definitions: * Logical Address: An address generated by the CPU; also referred to as a virtual address. * Physical Address: The address as seen by the memory unit. * Logical Address Space: The set of all logical addresses generated by a program. * Physical Address Space: The set of all physical addresses corresponding to those logical addresses.
Binding Differences: Logical and physical addresses are identical in compile-time and load-time schemes. They differ in execution-time address-binding schemes.

Memory-Management Unit (MMU) and Dynamic Loading/Linking

MMU Definition: A hardware device that performs the runtime mapping from virtual to physical addresses.
Simple Relocation Scheme: The value in a relocation register (formerly the base register) is added to every address generated by a user process at the time it is sent to memory. * Example: If the relocation register is set to $14000$ , a logical address of $346$ is mapped to physical address $14346$ . * User Perspective: The user program deals with logical addresses and never interacts with real physical addresses.
Dynamic Loading: A routine is not loaded until it is called. * Benefits: Better memory utilization; unused routines are never loaded. Useful for handling large amounts of infrequently used code (e.g., error handling). * Implementation: Done through program design; no special OS support required, though the OS may provide libraries.
Dynamic Linking: * Static Linking: System libraries and program code are combined into a single binary image by the loader. * Dynamic Linking: Linking is postponed until execution time. * Stub: A small piece of code used to locate the memory-resident library routine. The stub replaces itself with the address of the routine and executes it. * Shared Libraries: Systems where multiple processes share a single copy of library code.

Swapping Mechanisms and Constraints

Concept: A process can be swapped temporarily out of memory to a backing store and later brought back for continued execution.
Backing Store: A fast disk large enough to accommodate copies of all memory images for all users; it must provides direct access to these images.
Roll Out, Roll In: A swapping variant for priority-based scheduling; lower-priority processes are swapped out to make room for higher-priority processes.
Context Switch Time: Includes the time to swap processes. Swapping can significantly increase context switch overhead. * Example Calculation: A $100,MB$ process swapping to a disk with a $50,MB/sec$ transfer rate results in a swap-out time of $2000,ms$ . If another process of the same size is swapped in, the total component time is $4000,ms$ ( $4$ seconds).
Swapping Constraints: * Pending I/O: A process cannot be swapped out if I/O is pending to its memory space. Alternatively, I/O can be double-buffered in kernel space, though this adds overhead. * Modern Systems: Standard swapping is typically disabled but may be triggered if free memory falls below a specific threshold.
Mobile Systems: Generally do not support standard swapping due to flash memory limitations (limited write cycles and poor throughput). * iOS: Asks apps to voluntarily relinquish memory; read-only data is thrown out and reloaded later. Failure to free memory leads to termination. * Android: Terminates apps to free memory but saves application state to flash for fast restarts.

Contiguous Memory Allocation and Fragmentation

Partitioning: Main memory is usually divided into two partitions: the resident Operating System (usually in low memory) and User Processes (usually in high memory).
Multiple-Partition Allocation: * Fixed-size Partitions: The degree of multiprogramming is limited by the number of partitions. * Variable-size Partitions: Sized to meet the needs of a specific process.
Hole: A block of available memory. The OS maintains information about allocated and free partitions.
Dynamic Storage-Allocation Algorithms: * First-fit: Allocate the first hole that is large enough. * Best-fit: Allocate the smallest hole that is large enough (requires searching the list unless ordered by size); produces the smallest leftover hole. * Worst-fit: Allocate the largest hole; produces the largest leftover hole.
Fragmentation Types: * External Fragmentation: Total memory space exists to satisfy a request, but it is not contiguous (scattered small holes). * Internal Fragmentation: Memory allocated to a process is slightly larger than requested; the unused portion is internal to the partition.
50-Percent Rule: First-fit analysis shows that for every $N$ allocated blocks, another $0.5N$ blocks are lost to fragmentation ( $1/3$ of memory may be unusable).
Compaction: Shuffling memory contents to place all free memory into one large block. Only possible if relocation is dynamic and done at execution time.

Segmentation and Paging Schemes

Segmentation: Memory-management scheme supporting the user view of memory. A program is viewed as a collection of logical units (segments) such as main program, functions, stack, and symbol tables. * Architecture: Logical addresses consist of a two-tuple: $\langle \text{segment-number}, \text{offset} \rangle$ . * Segment Table: Maps two-dimensional physical addresses; entries include a base (starting physical address) and a limit (segment length). * STBR and STLR: Segment-table base register and Segment-table length register are used to manage the table.
Paging: Divides physical memory into fixed-sized blocks called frames and logical memory into blocks of the same size called pages. * Advantages: Avoids external fragmentation and the problem of varying-sized memory chunks. * Address Translation: CPU-generated addresses are divided into: * Page Number (p): Index into the page table containing the base address of each page. * Page Offset (d): Combined with the base address to define the physical address. * Page Sizes: Powers of $2$ , typically between $512$ bytes and $16,MB$ . * Internal Fragmentation: Still exists in paging. If a process requires $72,766$ bytes and the page size is $2,048$ bytes, it needs $35$ pages plus $1,086$ bytes, leaving $962$ bytes of internal fragmentation in the last frame.

Page Table Implementation and Optimization

Hardware Implementation: The page table is kept in main memory. The PTBR (Page-table base register) points to it, and the PTLR (Page-table length register) indicates its size.
Two-Memory Access Problem: To access data, the system needs one access for the page table and one for the data itself. This is solved with hardware caches called TLBs (Translation Look-aside Buffers).
TLB Characteristics: * Small, fast associative memory ( $64$ to $1,024$ entries). * Store ASIDs (Address-space identifiers) to provide protection and avoid flushing on context switches. * On a TLB hit, the frame number is retrieved immediately. On a TLB miss, the value is loaded from the page table in memory.
Memory Protection: Bits are associated with each frame (e.g., read-only, read-write). A valid-invalid bit indicates if a page is in the process's logical address space.
Shared Pages: Reentrant (read-only) code can be shared among processes (e.g., compilers). One copy of the code is kept in physical memory, mapped by multiple page tables.

Advanced Page Table Structures

Hierarchical Paging: Breaking the page table into multiple levels (e.g., a two-level page table where you "page the page table"). * Example: In a $32\text{-bit}$ system with $1,KB$ pages, the page number ( $22$ bits) is split into a $12\text{-bit}$ outer page number and a $10\text{-bit}$ inner page offset ( $p1, p2$ ).
Hashed Page Tables: Common in address spaces greater than $32$ bits. The virtual page number is hashed into a table where entries contain the mapped page frame and a pointer to the next element (chaining).
Inverted Page Table: One entry for every physical page of memory rather than every logical page. Reduces memory required for tables but increases search time (mitigated by hashing and TLBs).

Specific Hardware Architectures

Oracle SPARC Solaris: Uses complex hashing with two tables (one for kernel, one for user processes). Each entry represents a contiguous area of mapped virtual memory. Utilizes a TSB (Translation Storage Buffer) as a cache for TLB entries.
Intel IA-32 Architecture: Supports both segmentation and paging. Each segment can be up to $4,GB$ . Uses a Local Descriptor Table (LDT) for private segments and a Global Descriptor Table (GDT) for shared segments. * PAE (Page Address Extension): 3-level paging scheme allowing $32\text{-bit}$ apps access to a $36\text{-bit}$ address space ( $64,GB$ of physical memory).
Intel x86-64: Uses $48\text{-bit}$ addressing in practice with a 4-level paging hierarchy. Supports page sizes of $4,KB$ , $2,MB$ , and $1,GB$ .
ARM Architecture: Dominant mobile platform. Supports section sizes of $1,MB$ and $16,MB$ , or pages of $4,KB$ and $16,KB$ . Features two levels of TLBs (micro TLBs for data/instruction and a main TLB).