Unit V
SRM INSTITUTE OF SCIENCE & TECHNOLOGY
Page 1
Deemed to be University under the UGC Act, 1956
21CSS201T
Computer Organization and Architecture Unit-5.
Page 2: Contents
Parallelism: Need, types, applications, and challenges
Architecture of Parallel Systems - Flynn’s classification
ARM Processor: The thumb instruction set
Processor and CPU cores, Instruction Encoding format
Memory load and Store instruction
Basics of I/O operations
Case study: ARM 5 and ARM 7 Architecture
Page 3: Parallelism
Definition: Executing two or more operations simultaneously.
Purpose: Improve computer performance through parallel processing.
Parallel Computer: Set of processors working together on a problem.
ALUs in CPU can work concurrently for increased throughput.
Multiple processors can operate simultaneously.
Page 4: Goals of Parallelism
Increase Computational Speed: Reduces time to solve problems.
Increase Throughput: More processing in a given timeframe.
Improve Performance: Optimize computer performance at a given clock speed.
Handle Larger Problems: Solve complex computations beyond a single CPU's memory.
Page 5: Applications of Parallelism
Numeric Weather Prediction
Socio-economics
Finite Element Analysis
Artificial Intelligence and Automation
Genetic Engineering
Weapon Research and Defence
Medical Applications
Remote Sensing Applications
Page 6: Applications in Various Fields
Fields such as:
Aerospace
Automotive
Benchmarking
Biology
CFD
Database
Defense
Energy
Environment
Finance
Geophysics
Hardware
Information Services
Life Sciences
Medical
Media Research
Software
Telecommunications
Weather and Climate Research
Retail
Logistics Services
Page 7: Types of Parallelism
Hardware Parallelism
Increases processing speed via architecture:
Processor Parallelism: Multiple CPUs, nodes, cores, threads.
Memory Parallelism: Shared/distributed memory, hybrid models (PRAM).
Software Parallelism
Depends on control and data dependencies of programs, algorithm efficiency, programming style, and compiler optimization.
Page 8: Hardware Parallelism
Characterized by number of instruction issues per machine cycle:
k-issue Processor: Issues k instructions per cycle.
Conventional processors: One instruction per cycle.
Multiprocessor systems can handle nk threads simultaneously.
Page 9: Software Parallelism
Determined by program flow graph; reveals patterns of executable operations.
Varies during execution and limits sustained processor performance.
Page 10: Example - Detection of Parallelism
Consider simple high-level language statements:
P1: C = D X E
P2: M = G + C X P1
P3: A = B + C
Parallelism detection initiated for instructions P1-P5.
Page 11: Sequential Execution Example
Executes in five steps.
Each high-level statement represents a single instruction.
Page 12: Parallel Execution Example
Reduces execution to three steps if multiple adders are available.
Identifies possible execution pairs without resource conflicts.
Page 13: Mismatch in Software & Hardware Parallelism
Example demonstrating cycles in software parallelism.
Analysis shows varying degrees across cycles.
Page 14: Parallel Execution Using Two-Issue Processor
Executes one load/store and one arithmetic operation simultaneously.
Analysis shows a total of seven cycles needed for execution.
Page 15: Example of Dual-Processor System
Uses single-issue processors to execute instructions.
Six processor cycles required for 12 instructions, considering communication necessities.
Page 16: Software Parallelism Types
Instruction Level Parallelism
Task-Level Parallelism
Data Parallelism
Transaction Level Parallelism
Page 17: Instruction Level Parallelism (ILP)
Measures simultaneous operation capability.
Allows overlapping and reordering of instruction execution.
Page 18: ILP Example
Operations x = a + b, y = c - d, z = x * y.
X and Y can be computed simultaneously while Z depends on both.
Page 19: Performance under ILP
Assumes each operation completes in one unit of time; completion in 2 units for three operations.
ILP factor calculated as 3/2 = 1.5.
Page 20: Data-Level Parallelism (DLP)
Parallelization across multiple processors; distributes data concurrently across nodes.
Page 21: DLP Example
Summing array elements through parallel processing on 4 processors; reduces overall time required for operations.
Page 22: DLP in Adding Elements of Array
Timed execution differences depicted for parallel vs sequential methods.
Page 23: DLP in Matrix Multiplication
Speeds up computational efforts in matrix operations.
Page 24: Locality of Data References
Essential for evaluating DLP performance; linked to memory access behaviors and cache size.
Page 25: Flynn’s Classification
Proposed by Michael J. Flynn in 1966; taxonomy based on instruction and data processing capabilities.
Page 26: Classification Dimensions
Instruction stream: sequence of executed instructions.
Data stream: sequence of data used in instructions.
Classification divides computer architecture into categories based on instruction/data handling.
Page 27: Flynn's Classification Overview
Four categories:
SISD: Single Instruction Single Data
SIMD: Single Instruction Multiple Data
MISD: Multiple Instruction Single Data
MIMD: Multiple Instruction Multiple Data
Page 28: SISD Characteristics
Single instruction at one time; deterministic execution; only one operand set.
Page 29: SIMD Characteristics
Single instruction executed across multiple data elements simultaneously.
Page 30: MISD Characteristics
Single data stream processed by multiple independent instruction units.
Page 31: MIMD Characteristics
Independent instruction streams operating on potentially diverse data streams; execution modes vary as per task specifics.
Page 32: ARM Architecture Overview
Advanced RISC Machine: Features include)
16-bit instruction set: Thumb
On-chip debugging support, enhanced multiplier, embedded ICE hardware.
Page 33: ARM Features
RISC 32-bit architecture.
High performance, low power, regular register file architecture.
Supports pipelining and simple addressing modes.
Page 34: Detailed ARM Features
Conditional instruction execution, control over ALU & shifter operations, multiple load/store instructions.
Page 35: Thumb Instruction Set
16-bit compressed instruction; provides better code density at performance costs.
Differences outlined between ARM and Thumb instruction use.
Page 36: ARM vs Thumb Performance Analysis
General performance differences outlined between ARM code and Thumb code across 32-bit and 16-bit memory environments.
Page 37: ARM Core Dataflow Model
Shows ARM core as a data instruction functional unit.
ALU functionality and register communication dynamics explained.
Page 38: Instruction Transfer Mechanism
Load/store architecture; interaction between memory and registers.
Specific characteristics regarding ARM's instruction types and register inputs.
Page 39: ALU Functionality Overview
Description of data processing execution in an ARM system, including ALU pre-processing capabilities.
Page 40: ARM Registers Overview
Details on general-purpose registers and task-specific registers in user mode.
Page 41: Processor Modes Confirmation
Overview of ARM processor modes and their access rights.
Page 42: Understanding Processor Modes
Assignments and purposes of varying processor modes, highlighting privileges.
Page 43: Detailed Overview of Processor Modes
Explains how various modes operate and their triggers, from abort modes to user modes.
Page 44: Single-Core Computer Architecture
Basic components of a single-core CPU described with relevant architecture.
Page 45: Reiterated Single-Core CPU Design
Consistent components detailed, emphasizing individual units.
Page 46: Multi-Core Architecture
Description of multiple processor cores organization on a single die.
Page 47: Multi-Core CPU Overview
Details about cores fitting on a single processor socket, identified as CMP.
Page 48: Multi-Core Operation Dynamics
Parallel running of threads on multi-core structures outlined.
Page 49: Time-Slicing within Cores
Processes thread management explained in time-sliced environments.
Page 50: Instruction Encoding Overview
ISA defines instruction formats; encoding created via Opcode and operand fields.
Page 51: MIPS Instruction Encoding
Regularizes instruction encoding under RISC characteristics.
Page 52: I-Type (Immediate) Format
Encoding of immediate instructions in SPIM; examples provided for load operations.
Page 53: Specific I-Type Format Demonstrated
Example demonstrating encoding through Opcode structure in instruction execution.
Page 54: Sign Extension Explanation
Overview of sign extension relevance to ALU operations and compatibility.
Page 55: R-Type (Register) Format
Detailed breakdown of encoding for arithmetic, logical, and compare instructions.
Page 56: R-Type Example
Showcases R-Type instruction encoding for basic subtraction operation.
Page 57: Load and Store Instructions
Description of operation directives for loading and storing information in registers/memory.
Page 58: Addressing in Load/Store Instructions
Explanation of how addresses are computed within Load/Store commands.
Page 59: Examples of Load/Store Instructions
Syntax samples for various load/store operations, including specifics on addressing behaviors.
Page 60: Load-Store Formats Defined
Detailed encoding schemes explained for load/store instruction formats.
Page 61: Loading Constants into Registers
Discussion on pseudo-instructions for managing constants in registers through encoding intricacies.
Page 62: Loading Large Constants
Processes and examples provided for handling large constants in ARM architecture efficiently.
Page 63: Memory Addressing in Assembly
Methods outlined for assembling to place required addresses correctly in registers.
Page 64: Address Generation Methodologies
Discusses assembler mechanics at generating addresses for easy data access/loading.
Page 65: Memory Mapped vs I/O Mapped I/O
Differentiation between memory-mapped I/O and I/O mapped I/O; key contrasts established.
Page 66: Program Controlled I/O
Methodology explanation for program-controlled I/O, polling ideas, CPU register interactions.
Page 67: Typical Program Controlled Instructions
Commonly used MNEMONIC commands detailed for program control.
Page 68: Case Study: ARM 5 and ARM 7 Architecture
Overview of architecture case study context related to ARM 5 and ARM 7.
Page 69: Data Sizes and Instruction Sets
Define ARM as a 32-bit architecture; explanation of terminologies such as byte, halfword and word.
Page 70: ARM Processor Modes Overview
List of seven ARM operating modes, their roles, and privilege distinctions discussed.
Page 71: ARM Register Set Overview
Details on register access and visibility within varied ARM processor modes.
Page 72: Register Organization Summary
Comprehensive summary of active registers across user and privileged modes.
Page 73: ARM Register Counts and Description
Details of ARM's 37 registers and their specific functions or governance during mode changes.
Page 74: Program Status Registers
Description of condition codes and their relation to ALU operation results discussed with details.
Page 75: Program Counter Functionality
Specifies behaviors of Program Counter under different operational states of ARM (ARM, Thumb, Jazelle).
Page 76: ARM Architecture Development Overview
Framework of ARM architecture development across various iterations detailed with improvements.
Page 77: ARM Conditional Execution
Discusses conditional execution in instructions to enhance code flow efficiency.
Page 78: Condition Codes Explained
Break down of various conditions/actions that influence operation outcomes based upon flags.
Page 79: Examples of Conditional Execution in Code
Example sequences depicting conditional checks and relevant operations.
Page 80: ARM Branching Instructions
Discusses branching mechanisms including relative addressing within ARM instructions.
Page 81: Data Processing Instructions
Enumeration of data processing instructions available, their structure, syntax, and operational limits.
Page 82: The Barrel Shifter
Describes operations available through the barrel shifter functioning with the ALU for varied data manipulations.
Page 83: Immediate Constants Handling
Methods surrounding immediate constants processing within ARM assembly outlined with examples.
Page 84: Pseudo-instructions for Constant Loading
Discusses how constants are effectively loaded into registers using assembler capabilities.
Page 85: Multiply Instruction Syntax
Syntax data presented for various multiplication instructions within ARM architecture.
Page 86: Single Register Data Transfer Operations
Identification of various load/store operations within single register contexts.
Page 87: Addressing Mechanisms in Load/Store
Explains how addresses accessed by load/store instructions are specified referencing base registers plus offsets.
Page 88: LDM/STM Operation Syntax
Approach towards increment/decrement addressing modes within specified ARM operations illustrated.
Page 89: Pre and Post Indexed Addressing
Addresses how register interactions depend on usage of pre and post indexed addressing in ARM architecture.
Page 90: Software Interrupt (SWI) Mechanics
Discusses the software interrupt mechanism, its syntax, and operational handling in ARM design.
Page 91: PSR Transfer Instructions Overview
Syntax descriptions for transferring CPSR/SPSR contents to general-purpose registers detailed.
Page 92: ARM Branching and Subroutines
Describes implementation of branching and subroutine calling mechanisms within ARM assembly.
Page 93: Thumb Architecture Description
Explains the 16-bit Thumb architecture, benefits, and performance characteristics in code execution contexts.
Page 94: Example ARM-based System
Illustrates a typical ARM-based system including RAM, ROM, and core interrelations and components.
Page 95: AMBA Bridge Overview
Description of AMBA (Advanced Microcontroller Bus Architecture) and peripheral bus management framework.