1/86
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
Von Neumann Architecture
Stored program concept with main memory storing programs and data plus ALU operating on binary data plus control unit interpreting instructions plus input output equipment operated by control unit
Architecture vs Organization
Architecture is attributes visible to programmer like instruction set number of bits for data representation IO mechanisms addressing techniques while Organization is how features are implemented like control signals interfaces memory technology
Instruction Cycle
Two main steps Fetch and Execute where PC holds address of next instruction processor fetches instruction from memory location pointed to by PC increments PC instruction loaded into IR processor interprets instruction and performs required actions
Memory Hierarchy
From fastest to slowest Registers L1 Cache L2 Cache Main Memory Disk Cache Disk Optical Tape with tradeoff between speed cost and capacity
Locality of Reference
During execution of program memory references tend to cluster such as in loops making cache effective
Cache Memory
Small amount of fast memory between main memory and CPU that stores frequently accessed data with tags to identify which block of main memory is in each cache slot
Direct Mapping
Each block of main memory maps to only one cache line where address has tag field line field and word field with cache line equals block number modulo number of cache lines
Associative Mapping
Main memory block can load into any line of cache where memory address interpreted as tag and word and every lines tag examined for match allowing flexibility but requiring complex search
Set Associative Mapping
Cache divided into sets where each set contains multiple lines and given block maps to any line in a given set combining benefits of direct and associative such as 2way means 2 lines per set
Cache Write Policies
Write through means all writes go to main memory and cache with lots of traffic while write back means updates initially made in cache only with update bit set and write to main memory only when block replaced
Cache Performance
Hit ratio is fraction of memory accesses found in cache while access time equals cache access time plus miss rate times main memory access time with larger blocks increasing hit ratio initially
Pipelining
Overlapping execution of instructions through stages Fetch Decode Calculate operands Fetch operands Execute Write result allowing multiple instructions in different stages simultaneously
Five Stage Pipeline
Instruction Fetch where instruction fetched from memory then Instruction Decode where instruction decoded and registers read then Execute where ALU operation performed then Memory Access where memory operands accessed then Write Back where result written to register
Pipeline Hazards
Three types Structural when two instructions need same resource Data when instruction depends on result of previous instruction Control when branch changes instruction flow requiring pipeline flush
Data Hazards
Read After Write RAW or true dependency when instruction reads location modified by previous instruction Write After Read WAR or antidependency when instruction writes location read by previous instruction Write After Write WAW or output dependency when two instructions write same location
Hazard Resolution
Structural solved by duplicating resources Data solved by forwarding or stalling Control solved by branch prediction delayed branch or prefetching both paths
Branch Prediction
Predict never taken assumes jump will not happen Predict always taken assumes jump will happen Predict by opcode uses instruction type Taken not taken switch uses previous history Delayed branch executes instruction after branch before taking effect
Addressing Modes
Immediate where operand is part of instruction Direct where address field contains address of operand Indirect where address field points to address of operand Register where operand in register Register Indirect where register contains address of operand Displacement where EA equals A plus R
Immediate Addressing
Operand is part of instruction as in ADD 5 with no memory reference to fetch data making it fast but limited range
Direct Addressing
Address field contains address of operand where EA equals A requiring single memory reference with limited address space
Indirect Addressing
Memory cell pointed to by address field contains address of operand where EA equals contents of A allowing large address space but requiring multiple memory accesses
Register Addressing
Operand held in register named in address field where EA equals R with no memory access very fast execution but very limited address space
Register Indirect Addressing
Operand in memory cell pointed to by contents of register where EA equals contents of R allowing large address space with one fewer memory access than indirect addressing
Displacement Addressing
Effective address equals base value A plus register R used for relative addressing where R is PC base register addressing where R holds pointer to base or indexed addressing where A is base and R is displacement
Twos Complement
Representation where leftmost bit is sign bit and negation done by inverting all bits and adding 1 with benefit of one representation of zero and arithmetic works easily with range negative 2 to power n minus 1 to positive 2 to power n minus 1 minus 1
Instruction Format
Layout of bits in instruction including opcode and operands affected by memory size memory organization bus structure CPU complexity and CPU speed requiring tradeoff between powerful instruction repertoire and saving space
Instruction Types
Data processing like arithmetic and logical operations Data storage for main memory access Data movement for IO operations Program flow control like branch and jump
RISC Characteristics
Large number of general purpose registers or compiler optimized register use Limited and simple instruction set Emphasis on optimizing instruction pipeline One instruction per cycle Register to register operations Few simple addressing modes Hardwired design Fixed instruction format
RISC vs CISC
RISC has simple instructions executed in one cycle many registers hardwired control optimized for pipeline while CISC has complex instructions multiple cycles per instruction fewer registers microprogrammed control variable length instructions with semantic gap approach
Register Window
Technique using multiple small sets of registers where procedure calls switch to different set with three areas parameter registers local registers temporary registers where temporary from one set overlaps parameter from next allowing parameter passing without moving data
Superscalar
Common instructions can be initiated and executed independently with multiple parallel pipelines allowing more than one instruction per clock cycle applicable to both RISC and CISC but usually RISC
Instruction Level Parallelism
Instructions in sequence are independent so execution can be overlapped governed by data dependency and procedural dependency with machine parallelism being ability to take advantage determined by number of parallel pipelines
Out of Order Execution
Decouple decode pipeline from execution pipeline allowing fetch and decode to continue until pipeline full and when functional unit becomes available instruction executed even if earlier instructions stalled
Register Renaming
Registers allocated dynamically rather than specifically named to eliminate output dependencies and antidependencies allowing instructions to execute out of order without conflicts as in R3b equals R3a plus R5a then R3c equals R5a plus 1
Reorder Buffer
Temporary storage for results that commits to register file in program order ensuring correct architectural state even with out of order execution
Virtual Memory
Demand paging where not all pages of process required in memory with pages brought in as required using page fault mechanism when required page not in memory allowing processes larger than physical memory
Page Table
Data structure that maps virtual addresses to physical addresses with each process having own page table containing page frame numbers and control bits checked by MMU on each memory reference
Translation Lookaside Buffer TLB
Special cache for page table entries that speeds up virtual to physical address translation by avoiding main memory access for page table with typical size 32 to 1024 entries using associative lookup
Thrashing
Condition where operating system spends all time swapping pages with little real work done caused by too many processes in too little memory with disk light on constantly solved by good page replacement or reducing processes or adding memory
Interrupt Cycle
Added to instruction cycle where processor checks for interrupt and if interrupt pending suspends execution saves context sets PC to interrupt handler address processes interrupt then restores context and continues
DMA Direct Memory Access
Additional hardware module that takes over bus for IO transfer where CPU tells DMA controller read or write device address starting memory address and amount of data then CPU continues other work while DMA handles transfer and sends interrupt when finished
Memory Interleaving
Collection of DRAM chips grouped into memory banks that independently service requests so K banks can service K requests simultaneously improving throughput
Amdahls Law
Potential speedup equals 1 over quantity 1 minus f plus f over N where f is fraction of code parallelizable and N is number of processors showing diminishing returns as N increases and if f small then parallel processors have little effect
Moore's Law
Number of transistors on chip doubles every 18 months stated by Gordon Moore cofounder of Intel with cost of chip remaining almost unchanged and higher packing density means shorter electrical paths giving higher performance
DRAM vs SRAM
DRAM stores bits as charge in capacitors needs refreshing simpler construction smaller per bit less expensive slower used for main memory while SRAM stores bits as on off switches no refresh needed more complex larger per bit more expensive faster used for cache both volatile
Cache Line Size
Larger blocks increase hit ratio initially due to locality principle but hit ratio decreases as block becomes too large because probability of using newly fetched info becomes less than probability of reusing replaced info with typical size 8 to 64 bytes
Multilevel Cache
Use both on chip and off chip cache where L1 on chip is faster and frees bus while L2 off chip or on chip in static RAM is much faster than DRAM with L2 often using separate data path and modern systems having L3
Floating Point Representation
Format is sign bit followed by exponent followed by mantissa as in plus or minus significand times 2 to power exponent where mantissa stored in twos complement and exponent in excess or biased notation normalized so leading bit of mantissa is 1
IEEE 754
Standard for floating point storage with 32 bit and 64 bit formats having 8 bit and 11 bit exponents respectively with extended formats for intermediate results defining special values like infinity and NaN
Booth's Algorithm
Method for multiplying signed binary numbers that handles negative numbers correctly by examining pairs of bits in multiplier and performing add subtract or no operation based on bit pattern allowing efficient signed multiplication
Endianness
Byte ordering where big endian stores most significant byte at lowest address used by IBM 370 Motorola 680x0 most RISC and Internet while little endian stores least significant byte at lowest address used by Pentium x86 and VAX
Prefetch
Fetching next instruction during execution of current instruction since fetch accesses main memory while execution usually does not allowing overlap but not doubling performance because fetch usually shorter than execution and branches mean prefetched instructions may not be needed
Control Unit
Hardware segment that accepts operation code and issues control signals coordinates fetch decode execute cycle interprets instructions from memory manages data flow between CPU components and controls timing
Bus Structure
Communication pathway connecting two or more devices usually broadcast with three types Data bus carries data Address bus identifies source or destination Control bus carries control and timing information with width determining performance
IO Module Function
Provides interface to CPU and memory and interface to peripherals performing control and timing CPU communication device communication data buffering and error detection hiding device properties from CPU or revealing them
Programmed IO
CPU has direct control over IO sensing status issuing read write commands transferring data where CPU waits for IO module to complete operation wasting CPU time with CPU checking status bits periodically
Interrupt Driven IO
IO module interrupts CPU when ready overcoming CPU waiting where CPU issues read command IO module gets data from peripheral while CPU does other work IO module interrupts CPU then CPU requests data and IO module transfers data
Synchronous vs Asynchronous Bus
Synchronous has events determined by clock signals with control bus including clock line and single 1 to 0 is bus cycle while asynchronous uses handshaking protocol with request and acknowledge signals allowing devices of different speeds
Memory Mapped IO
Devices and memory share address space so IO looks like memory read write with no special commands and large selection of memory access commands while isolated IO has separate address spaces needing IO or memory select lines and special commands
Replacement Algorithms
Direct mapping has no choice each block maps to one line while associative and set associative use LRU least recently used FIFO first in first out LFU least frequently used or random with LRU most common implemented in hardware
Victim Cache
Small fully associative cache with 4 to 16 lines between L1 and next memory level that remembers what was discarded from L1 lowering miss penalty by using already fetched data again with little penalty
Loop Unrolling
Optimization that replicates loop body multiple times and iterates fewer times reducing loop overhead increasing instruction parallelism and improving register data cache or TLB locality
Delayed Load
Register to be target is locked by processor which continues executing instruction stream until register required then idles until load complete with rearranging instructions allowing useful work while loading
Scoreboarding
Bookkeeping technique that allows instruction execution whenever not dependent on previous instructions and no structural hazards tracking which registers are being written
Speculation
Executing instructions before knowing they are needed such as after branch where if prediction wrong must undo effects using techniques like reorder buffer to maintain correct state
Superscalar Pipeline
Has multiple parallel pipelines for different instruction types like integer ALU floating point load store allowing multiple instructions to execute simultaneously with sophisticated instruction scheduling
Pentium Pipeline
Original Pentium has two pipelines U pipe and V pipe allowing two instructions per clock under ideal conditions with dynamic branch prediction and separate code and data caches
ARM Cortex A8
Has dual in order issue 13 stage pipeline keeping power low with separate SIMD unit having 10 stage pipeline using two level global history branch predictor with BTB and GHB and return stack
Cortex A8 Stages
F0 address generation F1 fetch from L1 instruction cache F3 place in instruction queue D0 decompress Thumb D1 instruction decode D2 pending replay queue D3 instruction scheduling D4 final decode E0 register file access E1 barrel shifter E2 ALU E3 saturation E4 control flow E5 writeback
Semantic Gap
Difference between high level language statements and machine instructions that CISC attempted to close by implementing complex HLL operations in hardware while RISC focuses on simple frequently used operations
Compiler Optimization
Techniques include register allocation using graph coloring instruction scheduling to avoid hazards loop unrolling software pipelining and dead code elimination to improve performance
Graph Coloring
Register allocation method where nodes are symbolic registers edges connect registers live simultaneously and assign n colors for n real registers with uncolored nodes placed in memory
Stack Addressing
Operand implicitly on top of stack as in ADD operation that pops top two items adds them and pushes result used in zero address instructions
Postindex Addressing
Memory address is base register value then offset is added or subtracted and result written back to base register with base register acting as index
Preindex Addressing
Memory address formed as base plus offset then memory address also written back to base register so base register value incremented or decremented by offset
SIMD
Single Instruction Multiple Data where one instruction operates on multiple data elements simultaneously used for multimedia and scientific applications with packed data types like 4 32 bit values in 128 bit register
Memory Access Time
Time between presenting address and getting valid data while memory cycle time is access time plus recovery time before next access with transfer rate being rate at which data can be moved
Winchester Hard Disk
Sealed unit with one or more platters where heads fly on boundary layer of air as disk spins with very small head to disk gap providing fastest external storage now commonly 250GB and larger
RAID
Redundant Array of Independent Disks with 6 common levels where data distributed across physical drives appearing as single logical drive with RAID 0 striping RAID 1 mirroring RAID 5 striping with distributed parity most common
CD ROM
Constant linear velocity with data stored as pits in polycarbonate coated with reflective aluminum read by laser providing 650MB over 70 minutes originally for audio with various speeds quoted as multiples of 1.2 meters per second
Process State
Five state model has New Ready Running Blocked and Exit where scheduler moves processes between states with PCB containing identifier state priority program counter memory pointers context data IO status and accounting
Context Switch
Saving state of current process and loading state of next process involving saving registers program counter and other CPU state to PCB then loading from next process PCB managed by operating system
User Mode vs Supervisor Mode
User mode restricts privileged instructions and memory access while supervisor mode or kernel mode allows privileged instructions used by operating system for IO system calls and hardware access with mode bit in PSW
Segmentation
Memory management where program divided into logical segments like code data stack with each segment having base address and limit providing protection and sharing with segment table mapping logical to physical addresses
Paging vs Segmentation
Paging has fixed size pages invisible to programmer simplifies memory management while segmentation has variable size segments visible to programmer reflecting program structure with some systems combining both
Inverted Page Table
One entry per page frame instead of per page with entry containing process ID and virtual page number reducing table size for large address spaces but requiring associative search or hash
Virtual Address Translation
Virtual address split into page number and offset where page number indexes page table to get frame number then frame number concatenated with offset gives physical address checked by MMU with TLB speeding up translatio