Computer Architecture & Organization

0.0(0)

Studied by 0 people

0.0(0)

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/86

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

87 Terms

New cards

Von Neumann Architecture

Stored program concept with main memory storing programs and data plus ALU operating on binary data plus control unit interpreting instructions plus input output equipment operated by control unit

New cards

Architecture vs Organization

Architecture is attributes visible to programmer like instruction set number of bits for data representation IO mechanisms addressing techniques while Organization is how features are implemented like control signals interfaces memory technology

New cards

Instruction Cycle

Two main steps Fetch and Execute where PC holds address of next instruction processor fetches instruction from memory location pointed to by PC increments PC instruction loaded into IR processor interprets instruction and performs required actions

New cards

Memory Hierarchy

From fastest to slowest Registers L1 Cache L2 Cache Main Memory Disk Cache Disk Optical Tape with tradeoff between speed cost and capacity

New cards

Locality of Reference

During execution of program memory references tend to cluster such as in loops making cache effective

New cards

Cache Memory

Small amount of fast memory between main memory and CPU that stores frequently accessed data with tags to identify which block of main memory is in each cache slot

New cards

Direct Mapping

Each block of main memory maps to only one cache line where address has tag field line field and word field with cache line equals block number modulo number of cache lines

New cards

Associative Mapping

Main memory block can load into any line of cache where memory address interpreted as tag and word and every lines tag examined for match allowing flexibility but requiring complex search

New cards

Set Associative Mapping

Cache divided into sets where each set contains multiple lines and given block maps to any line in a given set combining benefits of direct and associative such as 2way means 2 lines per set

New cards

Cache Write Policies

Write through means all writes go to main memory and cache with lots of traffic while write back means updates initially made in cache only with update bit set and write to main memory only when block replaced

New cards

Cache Performance

Hit ratio is fraction of memory accesses found in cache while access time equals cache access time plus miss rate times main memory access time with larger blocks increasing hit ratio initially

New cards

Pipelining

Overlapping execution of instructions through stages Fetch Decode Calculate operands Fetch operands Execute Write result allowing multiple instructions in different stages simultaneously

New cards

Five Stage Pipeline

Instruction Fetch where instruction fetched from memory then Instruction Decode where instruction decoded and registers read then Execute where ALU operation performed then Memory Access where memory operands accessed then Write Back where result written to register

New cards

Pipeline Hazards

Three types Structural when two instructions need same resource Data when instruction depends on result of previous instruction Control when branch changes instruction flow requiring pipeline flush

New cards

Data Hazards

Read After Write RAW or true dependency when instruction reads location modified by previous instruction Write After Read WAR or antidependency when instruction writes location read by previous instruction Write After Write WAW or output dependency when two instructions write same location

New cards

Hazard Resolution

Structural solved by duplicating resources Data solved by forwarding or stalling Control solved by branch prediction delayed branch or prefetching both paths

New cards

Branch Prediction

Predict never taken assumes jump will not happen Predict always taken assumes jump will happen Predict by opcode uses instruction type Taken not taken switch uses previous history Delayed branch executes instruction after branch before taking effect

New cards

Addressing Modes

Immediate where operand is part of instruction Direct where address field contains address of operand Indirect where address field points to address of operand Register where operand in register Register Indirect where register contains address of operand Displacement where EA equals A plus R

New cards

Immediate Addressing

Operand is part of instruction as in ADD 5 with no memory reference to fetch data making it fast but limited range

New cards

Direct Addressing

Address field contains address of operand where EA equals A requiring single memory reference with limited address space

New cards

Indirect Addressing

Memory cell pointed to by address field contains address of operand where EA equals contents of A allowing large address space but requiring multiple memory accesses

New cards

Operand held in register named in address field where EA equals R with no memory access very fast execution but very limited address space

New cards

Operand in memory cell pointed to by contents of register where EA equals contents of R allowing large address space with one fewer memory access than indirect addressing

New cards

Displacement Addressing

Effective address equals base value A plus register R used for relative addressing where R is PC base register addressing where R holds pointer to base or indexed addressing where A is base and R is displacement

New cards

Twos Complement

Representation where leftmost bit is sign bit and negation done by inverting all bits and adding 1 with benefit of one representation of zero and arithmetic works easily with range negative 2 to power n minus 1 to positive 2 to power n minus 1 minus 1

New cards

Instruction Format

Layout of bits in instruction including opcode and operands affected by memory size memory organization bus structure CPU complexity and CPU speed requiring tradeoff between powerful instruction repertoire and saving space

New cards

Instruction Types

Data processing like arithmetic and logical operations Data storage for main memory access Data movement for IO operations Program flow control like branch and jump

New cards

RISC Characteristics

Large number of general purpose registers or compiler optimized register use Limited and simple instruction set Emphasis on optimizing instruction pipeline One instruction per cycle Register to register operations Few simple addressing modes Hardwired design Fixed instruction format

New cards

RISC vs CISC

RISC has simple instructions executed in one cycle many registers hardwired control optimized for pipeline while CISC has complex instructions multiple cycles per instruction fewer registers microprogrammed control variable length instructions with semantic gap approach

New cards

Technique using multiple small sets of registers where procedure calls switch to different set with three areas parameter registers local registers temporary registers where temporary from one set overlaps parameter from next allowing parameter passing without moving data

New cards

Superscalar

Common instructions can be initiated and executed independently with multiple parallel pipelines allowing more than one instruction per clock cycle applicable to both RISC and CISC but usually RISC

New cards

Instruction Level Parallelism

Instructions in sequence are independent so execution can be overlapped governed by data dependency and procedural dependency with machine parallelism being ability to take advantage determined by number of parallel pipelines

New cards

Out of Order Execution

Decouple decode pipeline from execution pipeline allowing fetch and decode to continue until pipeline full and when functional unit becomes available instruction executed even if earlier instructions stalled

New cards

Registers allocated dynamically rather than specifically named to eliminate output dependencies and antidependencies allowing instructions to execute out of order without conflicts as in R3b equals R3a plus R5a then R3c equals R5a plus 1

New cards

Reorder Buffer

Temporary storage for results that commits to register file in program order ensuring correct architectural state even with out of order execution

New cards

Virtual Memory

Demand paging where not all pages of process required in memory with pages brought in as required using page fault mechanism when required page not in memory allowing processes larger than physical memory

New cards

Page Table

Data structure that maps virtual addresses to physical addresses with each process having own page table containing page frame numbers and control bits checked by MMU on each memory reference

New cards

Translation Lookaside Buffer TLB

Special cache for page table entries that speeds up virtual to physical address translation by avoiding main memory access for page table with typical size 32 to 1024 entries using associative lookup

New cards

Thrashing

Condition where operating system spends all time swapping pages with little real work done caused by too many processes in too little memory with disk light on constantly solved by good page replacement or reducing processes or adding memory

New cards

Interrupt Cycle

Added to instruction cycle where processor checks for interrupt and if interrupt pending suspends execution saves context sets PC to interrupt handler address processes interrupt then restores context and continues

New cards

DMA Direct Memory Access

Additional hardware module that takes over bus for IO transfer where CPU tells DMA controller read or write device address starting memory address and amount of data then CPU continues other work while DMA handles transfer and sends interrupt when finished

New cards

Memory Interleaving

Collection of DRAM chips grouped into memory banks that independently service requests so K banks can service K requests simultaneously improving throughput

New cards

Amdahls Law

Potential speedup equals 1 over quantity 1 minus f plus f over N where f is fraction of code parallelizable and N is number of processors showing diminishing returns as N increases and if f small then parallel processors have little effect

New cards

Moore's Law

Number of transistors on chip doubles every 18 months stated by Gordon Moore cofounder of Intel with cost of chip remaining almost unchanged and higher packing density means shorter electrical paths giving higher performance

New cards

DRAM vs SRAM

DRAM stores bits as charge in capacitors needs refreshing simpler construction smaller per bit less expensive slower used for main memory while SRAM stores bits as on off switches no refresh needed more complex larger per bit more expensive faster used for cache both volatile

New cards

Cache Line Size

Larger blocks increase hit ratio initially due to locality principle but hit ratio decreases as block becomes too large because probability of using newly fetched info becomes less than probability of reusing replaced info with typical size 8 to 64 bytes

New cards

Multilevel Cache

Use both on chip and off chip cache where L1 on chip is faster and frees bus while L2 off chip or on chip in static RAM is much faster than DRAM with L2 often using separate data path and modern systems having L3

New cards

Floating Point Representation

Format is sign bit followed by exponent followed by mantissa as in plus or minus significand times 2 to power exponent where mantissa stored in twos complement and exponent in excess or biased notation normalized so leading bit of mantissa is 1

New cards

IEEE 754

Standard for floating point storage with 32 bit and 64 bit formats having 8 bit and 11 bit exponents respectively with extended formats for intermediate results defining special values like infinity and NaN

New cards

Booth's Algorithm

Method for multiplying signed binary numbers that handles negative numbers correctly by examining pairs of bits in multiplier and performing add subtract or no operation based on bit pattern allowing efficient signed multiplication

New cards

Endianness

Byte ordering where big endian stores most significant byte at lowest address used by IBM 370 Motorola 680x0 most RISC and Internet while little endian stores least significant byte at lowest address used by Pentium x86 and VAX

New cards

Prefetch

Fetching next instruction during execution of current instruction since fetch accesses main memory while execution usually does not allowing overlap but not doubling performance because fetch usually shorter than execution and branches mean prefetched instructions may not be needed

New cards

Control Unit

Hardware segment that accepts operation code and issues control signals coordinates fetch decode execute cycle interprets instructions from memory manages data flow between CPU components and controls timing

New cards

Bus Structure

Communication pathway connecting two or more devices usually broadcast with three types Data bus carries data Address bus identifies source or destination Control bus carries control and timing information with width determining performance

New cards

IO Module Function

Provides interface to CPU and memory and interface to peripherals performing control and timing CPU communication device communication data buffering and error detection hiding device properties from CPU or revealing them

New cards

Programmed IO

CPU has direct control over IO sensing status issuing read write commands transferring data where CPU waits for IO module to complete operation wasting CPU time with CPU checking status bits periodically

New cards

Interrupt Driven IO

IO module interrupts CPU when ready overcoming CPU waiting where CPU issues read command IO module gets data from peripheral while CPU does other work IO module interrupts CPU then CPU requests data and IO module transfers data

New cards

Synchronous vs Asynchronous Bus

Synchronous has events determined by clock signals with control bus including clock line and single 1 to 0 is bus cycle while asynchronous uses handshaking protocol with request and acknowledge signals allowing devices of different speeds

New cards

Memory Mapped IO

Devices and memory share address space so IO looks like memory read write with no special commands and large selection of memory access commands while isolated IO has separate address spaces needing IO or memory select lines and special commands

New cards

Replacement Algorithms

Direct mapping has no choice each block maps to one line while associative and set associative use LRU least recently used FIFO first in first out LFU least frequently used or random with LRU most common implemented in hardware

New cards

Victim Cache

Small fully associative cache with 4 to 16 lines between L1 and next memory level that remembers what was discarded from L1 lowering miss penalty by using already fetched data again with little penalty

New cards

Loop Unrolling

Optimization that replicates loop body multiple times and iterates fewer times reducing loop overhead increasing instruction parallelism and improving register data cache or TLB locality

New cards

Delayed Load

Register to be target is locked by processor which continues executing instruction stream until register required then idles until load complete with rearranging instructions allowing useful work while loading

New cards

Scoreboarding

Bookkeeping technique that allows instruction execution whenever not dependent on previous instructions and no structural hazards tracking which registers are being written

New cards

Speculation

Executing instructions before knowing they are needed such as after branch where if prediction wrong must undo effects using techniques like reorder buffer to maintain correct state

New cards

Superscalar Pipeline

Has multiple parallel pipelines for different instruction types like integer ALU floating point load store allowing multiple instructions to execute simultaneously with sophisticated instruction scheduling

New cards

Pentium Pipeline

Original Pentium has two pipelines U pipe and V pipe allowing two instructions per clock under ideal conditions with dynamic branch prediction and separate code and data caches

New cards

ARM Cortex A8

Has dual in order issue 13 stage pipeline keeping power low with separate SIMD unit having 10 stage pipeline using two level global history branch predictor with BTB and GHB and return stack

New cards

Cortex A8 Stages

F0 address generation F1 fetch from L1 instruction cache F3 place in instruction queue D0 decompress Thumb D1 instruction decode D2 pending replay queue D3 instruction scheduling D4 final decode E0 register file access E1 barrel shifter E2 ALU E3 saturation E4 control flow E5 writeback

New cards

Semantic Gap

Difference between high level language statements and machine instructions that CISC attempted to close by implementing complex HLL operations in hardware while RISC focuses on simple frequently used operations

New cards

Compiler Optimization

Techniques include register allocation using graph coloring instruction scheduling to avoid hazards loop unrolling software pipelining and dead code elimination to improve performance

New cards

Graph Coloring

Register allocation method where nodes are symbolic registers edges connect registers live simultaneously and assign n colors for n real registers with uncolored nodes placed in memory

New cards

Stack Addressing

Operand implicitly on top of stack as in ADD operation that pops top two items adds them and pushes result used in zero address instructions

New cards

Postindex Addressing

Memory address is base register value then offset is added or subtracted and result written back to base register with base register acting as index

New cards

Preindex Addressing

Memory address formed as base plus offset then memory address also written back to base register so base register value incremented or decremented by offset

New cards

SIMD

Single Instruction Multiple Data where one instruction operates on multiple data elements simultaneously used for multimedia and scientific applications with packed data types like 4 32 bit values in 128 bit register

New cards

Memory Access Time

Time between presenting address and getting valid data while memory cycle time is access time plus recovery time before next access with transfer rate being rate at which data can be moved

New cards

Winchester Hard Disk

Sealed unit with one or more platters where heads fly on boundary layer of air as disk spins with very small head to disk gap providing fastest external storage now commonly 250GB and larger

New cards

RAID

Redundant Array of Independent Disks with 6 common levels where data distributed across physical drives appearing as single logical drive with RAID 0 striping RAID 1 mirroring RAID 5 striping with distributed parity most common

New cards

CD ROM

Constant linear velocity with data stored as pits in polycarbonate coated with reflective aluminum read by laser providing 650MB over 70 minutes originally for audio with various speeds quoted as multiples of 1.2 meters per second

New cards

Process State

Five state model has New Ready Running Blocked and Exit where scheduler moves processes between states with PCB containing identifier state priority program counter memory pointers context data IO status and accounting

New cards

Context Switch

Saving state of current process and loading state of next process involving saving registers program counter and other CPU state to PCB then loading from next process PCB managed by operating system

New cards

User Mode vs Supervisor Mode

User mode restricts privileged instructions and memory access while supervisor mode or kernel mode allows privileged instructions used by operating system for IO system calls and hardware access with mode bit in PSW

New cards

Segmentation

Memory management where program divided into logical segments like code data stack with each segment having base address and limit providing protection and sharing with segment table mapping logical to physical addresses

New cards

Paging vs Segmentation

Paging has fixed size pages invisible to programmer simplifies memory management while segmentation has variable size segments visible to programmer reflecting program structure with some systems combining both

New cards

Inverted Page Table

One entry per page frame instead of per page with entry containing process ID and virtual page number reducing table size for large address spaces but requiring associative search or hash

New cards

Virtual Address Translation

Virtual address split into page number and offset where page number indexes page table to get frame number then frame number concatenated with offset gives physical address checked by MMU with TLB speeding up translatio