Main Memory and Memory Management Strategies

Chapter 9: Main Memory Overview

The primary objective of memory management is to provide a detailed description of various ways of organizing memory hardware and memory-management techniques.
A specific focus is placed on architectures such as the Intel Pentium, which supports pure segmentation and segmentation with paging.

Background and Hardware Access

Program Execution Requirements: A program must be brought from the disk into memory and placed within a process for it to be run.
Direct Access: The CPU can only directly access main memory and registers.
Memory Unit Interface: The memory unit only observes a stream of addresses combined with read requests, or addresses combined with data and write requests.
Access Speed and Performance: - Registers: Access is completed within one CPU clock cycle or less. - Main Memory: Access can take many cycles, which may cause a CPU stall. - Cache: A cache sits between main memory and CPU registers to mitigate performance gaps.
Protection: Memory protection is required to ensure correct operation of the system.

Memory Protection Mechanisms

Process Address Space: It is essential to ensure that a process can only access those addresses within its specific address space.
Base and Limit Registers: Protection is provided by a pair of registers that define the logical address space of a process: - Base Register: Stores the smallest legal physical memory address. - Limit Register: Specifies the range or size of the logical address space.
Hardware Checks: The CPU must check every memory access generated in user mode to ensure it falls between the base and limit values for that user.
Privileged Instructions: The instructions used to load the base and limit registers are privileged, meaning they can only be executed by the operating system kernel.

Address Binding Stages

Input Queue: Programs residing on the disk, ready to be brought into memory for execution, form an input queue.
Address Binding: Addresses are represented in different ways during different stages of a program’s life: - Source Code: Addresses are usually symbolic. - Compiled Code: Addresses bind to relocatable addresses (e.g., "14 bytes from the beginning of this module"). - Linker or Loader: Binds relocatable addresses to absolute addresses (e.g., $74014$ ).
Binding Times: - Compile Time: If the memory location is known a priori, absolute code can be generated. Recompilation is required if the starting location changes. - Load Time: Relocatable code must be generated if the memory location is not known at compile time. - Execution Time: Binding is delayed until run time if the process can be moved during execution from one memory segment to another. This requires hardware support for address maps (e.g., base and limit registers).

Logical versus Physical Address Space

Logical Address: An address generated by the CPU; also referred to as a virtual address.
Physical Address: An address seen by the memory unit.
Logical Address Space: The set of all logical addresses generated by a program.
Physical Address Space: The set of all physical addresses generated by a program.
Binding Correspondence: Logical and physical addresses are the same in compile-time and load-time binding schemes. They differ in execution-time address-binding schemes.

Memory-Management Unit (MMU)

Definition: A hardware device that performs the run-time mapping from virtual to physical addresses.
Relocation Register Scheme: A simple generalization of the base-register scheme where the base register is called a relocation register.
Mapping Process: The value in the relocation register is added to every address generated by a user process at the time it is sent to memory.
User View: The user program deals exclusively with logical addresses; it never sees real physical addresses. Execution-time binding occurs when a memory reference is made.

Dynamic Loading and Dynamic Linking

Dynamic Loading: - Routines are not loaded until they are called. - Allows for better memory-space utilization because unused routines are never loaded. - Particularly useful when large amounts of code are needed for infrequently occurring cases. - All routines are kept on disk in relocatable load format. - No special OS support is required; it is implemented through program design, though the OS can provide libraries.
Dynamic Linking: - Static Linking: System libraries and program code are combined by the loader into a single binary program image. - Dynamic Linking: Linking is postponed until execution time. - Stubs: A small piece of code, a stub, is used to locate the appropriate memory-resident library routine. The stub replaces itself with the address of the routine and executes it. - Shared Libraries: The OS checks if the routine is in the process address space; if not, it adds it. Dynamic linking is widely used for shared libraries and patching system libraries. Versioning may be used to manage updates.

Contiguous Memory Allocation

Layout: Main memory is usually divided into two partitions: - Resident Operating System: Usually held in low memory with the interrupt vector. - User Processes: Held in high memory.
Method: Each process is contained in a single contiguous section of memory.
Relocation and Protection: Relocation registers protect user processes from each other and from modifying the OS. The base register contains the value of the smallest physical address, and the limit register contains the range of logical addresses (logical\,address < limit\,register).

Variable Partition Allocation

Mechanism: The OS maintains information about allocated partitions and free partitions (holes).
Hole: A block of available memory. Holes of various sizes are scattered throughout memory.
Allocation: When a process arrives, it is allocated memory from a hole large enough to accommodate it.
Deallocation: When a process exits, it frees its partition, and adjacent free partitions are combined into a larger hole.
Dynamic Storage-Allocation Problem Strategies: - First-fit: Allocate the first hole that is big enough. - Best-fit: Allocate the smallest hole that is big enough. Requires searching the entire list unless it is ordered by size. Produces the smallest leftover hole. - Worst-fit: Allocate the largest hole. Requires searching the entire list. Produces the largest leftover hole. - Performance: First-fit and best-fit are generally better than worst-fit in terms of speed and storage utilization.

Fragmentation

External Fragmentation: Total memory space exists to satisfy a request, but it is not contiguous.
Internal Fragmentation: Allocated memory may be slightly larger than the requested memory. The unused portion inside the partition represents internal fragmentation.
50-percent Rule: Analysis of first-fit reveals that for $N$ allocated blocks, $0.5N$ blocks are lost to fragmentation, meaning $1/3$ of memory may be unusable.
Compaction: A method to reduce external fragmentation by shuffling memory contents to place all free memory together. - Possible only if relocation is dynamic and done at execution time. - Requires addressing the I/O problem (latching jobs involved in I/O or performing I/O only into OS buffers).

Paging Fundamentals

Concept: Physical address space of a process can be noncontiguous. Physical memory is allocated whenever available, avoiding external fragmentation and varying-sized memory chunks.
Frames: Physical memory is divided into fixed-sized blocks called frames. Frame sizes are powers of $2$ , typically between $512\,bytes$ and $16\,Mbytes$ .
Pages: Logical memory is divided into blocks of the same size called pages.
Implementation: The OS keeps track of all free frames. To run a program of $N$ pages, $N$ free frames must be found. A page table is set up to translate logical to physical addresses.
Internal Fragmentation in Paging: Still occurs because a process size might not be a multiple of the page size.
Fragmentation Example Calculation: - Page size = $2,048\,bytes$ . - Process size = $72,766\,bytes$ . - $35\,pages + 1,086\,bytes$ . - Internal fragmentation = $2,048 - 1,086 = 962\,bytes$ . - Worst-case fragmentation = $1\,frame - 1\,byte$ . - Average fragmentation = $\frac{1}{2}\,frame\,size$ .

Paging Hardware and Address Translation

Address Structure: The address generated by the CPU is divided into two parts: - Page Number ( $p$ ): Used as an index into a page table containing the base address of each page in physical memory. - Page Offset ( $d$ ): Combined with the base address to define the physical memory address.
Mathematical Model: For a logical address space of size $2^m$ and page size of $2^n$ , the page number is $m - n$ bits and the offset is $n$ bits.
Page Table Implementation: - Page-table base register (PTBR): Points to the page table. - Page-table length register (PTLR): Indicates the size of the page table.
Performance Problem: In this scheme, every data/instruction access requires two memory accesses (one for the page table, one for the data).

Translation Look-Aside Buffer (TLB)

Definition: A special fast-lookup hardware cache (associative memory) used to solve the two-memory access problem.
Functionality: - Parallel search for the page number ( $p$ ). - If $p$ is in the associative register (TLB hit), the frame number is retrieved immediately. - If it is a TLB miss, the value is loaded into the TLB from the page table in memory.
Address-Space Identifiers (ASIDs): Some TLBs store ASIDs to uniquely identify each process, providing protection and avoiding the need to flush the TLB at every context switch.
Size: Typically small, between $64$ and $1,024$ entries.
Effective Access Time (EAT) Calculations: - Let Hit Ratio = percentage of times page number is found in TLB. - Example: $80\%$ hit ratio, $10\,ns$ memory access time. - $EAT = 0.80 \times 10 + 0.20 \times 20 = 12\,ns$ ( $20\%$ slowdown). - Example: $99\%$ hit ratio, $10\,ns$ memory access time. - $EAT = 0.99 \times 10 + 0.01 \times 20 = 10.1\,ns$ ( $1\%$ slowdown).

Memory Protection in Paging

Protection Bits: Associated with each frame to indicate access rights (read-only, read-write, execute-only).
Valid-Invalid Bit: - Valid: Indicates the page is in the process’s logical address space. - Invalid: Indicates the page is not in the process’s logical address space.
Traps: Any violation of these protections results in a trap to the kernel.

Shared Pages

Shared Code: One copy of read-only (reentrant) code can be shared among processes (e.g., compilers, window systems, libc).
Private Code and Data: Each process keeps a separate copy of its private data and code, which can appear anywhere in the logical address space.

Page Table Structures

Hierarchical Paging: Breaks up the logical address space into multiple page tables. A common example is two-level paging where the page table itself is paged. - In a 32-bit machine with a $4\,KB$ page size ( $2^{12}$ ), a 2-level scheme might use a 10-bit page number ( $p_1$ ), a 10-bit page offset ( $p_2$ ), and a 12-bit displacement ( $d$ ).
Hashed Page Tables: Common in address spaces greater than 32 bits. The virtual page number is hashed into a table where each entry contains a chain of elements (virtual page number, mapped frame, pointer to next). - Clustered Page Tables: A variation for 64-bit addresses where each entry refers to multiple pages (e.g., 16), useful for sparse address spaces.
Inverted Page Table: Tracks all physical pages instead of logical pages. One entry for each real page of memory containing the virtual address and the process ID (PID). - Reduces memory needed for tables but increases search time (can be mitigated by hash tables and TLBs).

Architecture Specific Examples

Oracle SPARC Solaris (64-bit OS): - Uses two hash tables: one for the kernel and one for user processes. - TTE (Translation Table Entry) stores information. - TSB (Translation Storage Buffer) acts as a cache for TTEs. - CPU hardware walks the TSB on a TLB miss before interrupting the kernel to search hash tables.
Intel IA-32 Architecture: - Supports both segmentation and segmentation with paging. - Segment limits: up to $16,384$ ( $16\,K$ ) segments per process (8K private in LDT, 8K shared in GDT). - Paging units form the MMU; page sizes can be $4\,KB$ or $4\,MB$ . - PAE (Page Address Extension): 3-level paging scheme allowing 32-bit apps to access up to $64\,GB$ of physical memory (36-bit address space).
Intel x86-64 Architecture: - Implements 48-bit addressing with 4 levels of paging hierarchy. - Page sizes: $4\,KB$ , $2\,MB$ , and $1\,GB$ .
ARM Architecture: - Dominant mobile platform chip. - Page sizes: $4\,KB$ and $16\,KB$ ; sections: $1\,MB$ and $16\,MB$ . - Two levels of TLBs: two micro TLBs (instruction/data) and one main TLB.

Swapping

Definition: Moving a process temporarily out of memory to a backing store (fast disk) and bringing it back for continued execution.
Standard Swapping: Allows the total physical memory space of processes to exceed actual physical memory.
Backing Store: Must be large enough to accommodate copies of all memory images.
Roll out, roll in: A priority-based variant where lower-priority processes are swapped out for higher-priority ones.
Context Switch Time: Swapping significantly increases context switch time. - Example: $100\,MB$ process, $50\,MB/s$ transfer rate results in a $2,000\,ms$ swap-out time and total swap component of $4,000\,ms$ ( $4\,seconds$ ).
Swapping on Mobile Systems: - Generally not supported due to flash memory limitations (limited write cycles, poor throughput). - iOS: Apps voluntarily relinquish memory; read-only data is thrown out and reloaded from flash. - Android: Terminates apps to free memory but saves application state to flash for rapid restart.