Unit V

SRM INSTITUTE OF SCIENCE & TECHNOLOGY

Page 1

  • Deemed to be University under the UGC Act, 1956

  • 21CSS201T

  • Computer Organization and Architecture Unit-5.

Page 2: Contents

  • Parallelism: Need, types, applications, and challenges

  • Architecture of Parallel Systems - Flynn’s classification

  • ARM Processor: The thumb instruction set

  • Processor and CPU cores, Instruction Encoding format

  • Memory load and Store instruction

  • Basics of I/O operations

  • Case study: ARM 5 and ARM 7 Architecture

Page 3: Parallelism

  • Definition: Executing two or more operations simultaneously.

  • Purpose: Improve computer performance through parallel processing.

  • Parallel Computer: Set of processors working together on a problem.

  • ALUs in CPU can work concurrently for increased throughput.

  • Multiple processors can operate simultaneously.

Page 4: Goals of Parallelism

  • Increase Computational Speed: Reduces time to solve problems.

  • Increase Throughput: More processing in a given timeframe.

  • Improve Performance: Optimize computer performance at a given clock speed.

  • Handle Larger Problems: Solve complex computations beyond a single CPU's memory.

Page 5: Applications of Parallelism

  • Numeric Weather Prediction

  • Socio-economics

  • Finite Element Analysis

  • Artificial Intelligence and Automation

  • Genetic Engineering

  • Weapon Research and Defence

  • Medical Applications

  • Remote Sensing Applications

Page 6: Applications in Various Fields

  • Fields such as:

    • Aerospace

    • Automotive

    • Benchmarking

    • Biology

    • CFD

    • Database

    • Defense

    • Energy

    • Environment

    • Finance

    • Geophysics

    • Hardware

    • Information Services

    • Life Sciences

    • Medical

    • Media Research

    • Software

    • Telecommunications

    • Weather and Climate Research

    • Retail

    • Logistics Services

Page 7: Types of Parallelism

  1. Hardware Parallelism

    • Increases processing speed via architecture:

      • Processor Parallelism: Multiple CPUs, nodes, cores, threads.

      • Memory Parallelism: Shared/distributed memory, hybrid models (PRAM).

  2. Software Parallelism

    • Depends on control and data dependencies of programs, algorithm efficiency, programming style, and compiler optimization.

Page 8: Hardware Parallelism

  • Characterized by number of instruction issues per machine cycle:

    • k-issue Processor: Issues k instructions per cycle.

    • Conventional processors: One instruction per cycle.

    • Multiprocessor systems can handle nk threads simultaneously.

Page 9: Software Parallelism

  • Determined by program flow graph; reveals patterns of executable operations.

  • Varies during execution and limits sustained processor performance.

Page 10: Example - Detection of Parallelism

  • Consider simple high-level language statements:

    • P1: C = D X E

    • P2: M = G + C X P1

    • P3: A = B + C

    • Parallelism detection initiated for instructions P1-P5.

Page 11: Sequential Execution Example

  • Executes in five steps.

  • Each high-level statement represents a single instruction.

Page 12: Parallel Execution Example

  • Reduces execution to three steps if multiple adders are available.

  • Identifies possible execution pairs without resource conflicts.

Page 13: Mismatch in Software & Hardware Parallelism

  • Example demonstrating cycles in software parallelism.

  • Analysis shows varying degrees across cycles.

Page 14: Parallel Execution Using Two-Issue Processor

  • Executes one load/store and one arithmetic operation simultaneously.

  • Analysis shows a total of seven cycles needed for execution.

Page 15: Example of Dual-Processor System

  • Uses single-issue processors to execute instructions.

  • Six processor cycles required for 12 instructions, considering communication necessities.

Page 16: Software Parallelism Types

  1. Instruction Level Parallelism

  2. Task-Level Parallelism

  3. Data Parallelism

  4. Transaction Level Parallelism

Page 17: Instruction Level Parallelism (ILP)

  • Measures simultaneous operation capability.

  • Allows overlapping and reordering of instruction execution.

Page 18: ILP Example

  • Operations x = a + b, y = c - d, z = x * y.

  • X and Y can be computed simultaneously while Z depends on both.

Page 19: Performance under ILP

  • Assumes each operation completes in one unit of time; completion in 2 units for three operations.

  • ILP factor calculated as 3/2 = 1.5.

Page 20: Data-Level Parallelism (DLP)

  • Parallelization across multiple processors; distributes data concurrently across nodes.

Page 21: DLP Example

  • Summing array elements through parallel processing on 4 processors; reduces overall time required for operations.

Page 22: DLP in Adding Elements of Array

  • Timed execution differences depicted for parallel vs sequential methods.

Page 23: DLP in Matrix Multiplication

  • Speeds up computational efforts in matrix operations.

Page 24: Locality of Data References

  • Essential for evaluating DLP performance; linked to memory access behaviors and cache size.

Page 25: Flynn’s Classification

  • Proposed by Michael J. Flynn in 1966; taxonomy based on instruction and data processing capabilities.

Page 26: Classification Dimensions

  • Instruction stream: sequence of executed instructions.

  • Data stream: sequence of data used in instructions.

  • Classification divides computer architecture into categories based on instruction/data handling.

Page 27: Flynn's Classification Overview

  • Four categories:

    • SISD: Single Instruction Single Data

    • SIMD: Single Instruction Multiple Data

    • MISD: Multiple Instruction Single Data

    • MIMD: Multiple Instruction Multiple Data

Page 28: SISD Characteristics

  • Single instruction at one time; deterministic execution; only one operand set.

Page 29: SIMD Characteristics

  • Single instruction executed across multiple data elements simultaneously.

Page 30: MISD Characteristics

  • Single data stream processed by multiple independent instruction units.

Page 31: MIMD Characteristics

  • Independent instruction streams operating on potentially diverse data streams; execution modes vary as per task specifics.

Page 32: ARM Architecture Overview

  • Advanced RISC Machine: Features include)

    • 16-bit instruction set: Thumb

    • On-chip debugging support, enhanced multiplier, embedded ICE hardware.

Page 33: ARM Features

  • RISC 32-bit architecture.

  • High performance, low power, regular register file architecture.

  • Supports pipelining and simple addressing modes.

Page 34: Detailed ARM Features

  • Conditional instruction execution, control over ALU & shifter operations, multiple load/store instructions.

Page 35: Thumb Instruction Set

  • 16-bit compressed instruction; provides better code density at performance costs.

  • Differences outlined between ARM and Thumb instruction use.

Page 36: ARM vs Thumb Performance Analysis

  • General performance differences outlined between ARM code and Thumb code across 32-bit and 16-bit memory environments.

Page 37: ARM Core Dataflow Model

  • Shows ARM core as a data instruction functional unit.

  • ALU functionality and register communication dynamics explained.

Page 38: Instruction Transfer Mechanism

  • Load/store architecture; interaction between memory and registers.

  • Specific characteristics regarding ARM's instruction types and register inputs.

Page 39: ALU Functionality Overview

  • Description of data processing execution in an ARM system, including ALU pre-processing capabilities.

Page 40: ARM Registers Overview

  • Details on general-purpose registers and task-specific registers in user mode.

Page 41: Processor Modes Confirmation

  • Overview of ARM processor modes and their access rights.

Page 42: Understanding Processor Modes

  • Assignments and purposes of varying processor modes, highlighting privileges.

Page 43: Detailed Overview of Processor Modes

  • Explains how various modes operate and their triggers, from abort modes to user modes.

Page 44: Single-Core Computer Architecture

  • Basic components of a single-core CPU described with relevant architecture.

Page 45: Reiterated Single-Core CPU Design

  • Consistent components detailed, emphasizing individual units.

Page 46: Multi-Core Architecture

  • Description of multiple processor cores organization on a single die.

Page 47: Multi-Core CPU Overview

  • Details about cores fitting on a single processor socket, identified as CMP.

Page 48: Multi-Core Operation Dynamics

  • Parallel running of threads on multi-core structures outlined.

Page 49: Time-Slicing within Cores

  • Processes thread management explained in time-sliced environments.

Page 50: Instruction Encoding Overview

  • ISA defines instruction formats; encoding created via Opcode and operand fields.

Page 51: MIPS Instruction Encoding

  • Regularizes instruction encoding under RISC characteristics.

Page 52: I-Type (Immediate) Format

  • Encoding of immediate instructions in SPIM; examples provided for load operations.

Page 53: Specific I-Type Format Demonstrated

  • Example demonstrating encoding through Opcode structure in instruction execution.

Page 54: Sign Extension Explanation

  • Overview of sign extension relevance to ALU operations and compatibility.

Page 55: R-Type (Register) Format

  • Detailed breakdown of encoding for arithmetic, logical, and compare instructions.

Page 56: R-Type Example

  • Showcases R-Type instruction encoding for basic subtraction operation.

Page 57: Load and Store Instructions

  • Description of operation directives for loading and storing information in registers/memory.

Page 58: Addressing in Load/Store Instructions

  • Explanation of how addresses are computed within Load/Store commands.

Page 59: Examples of Load/Store Instructions

  • Syntax samples for various load/store operations, including specifics on addressing behaviors.

Page 60: Load-Store Formats Defined

  • Detailed encoding schemes explained for load/store instruction formats.

Page 61: Loading Constants into Registers

  • Discussion on pseudo-instructions for managing constants in registers through encoding intricacies.

Page 62: Loading Large Constants

  • Processes and examples provided for handling large constants in ARM architecture efficiently.

Page 63: Memory Addressing in Assembly

  • Methods outlined for assembling to place required addresses correctly in registers.

Page 64: Address Generation Methodologies

  • Discusses assembler mechanics at generating addresses for easy data access/loading.

Page 65: Memory Mapped vs I/O Mapped I/O

  • Differentiation between memory-mapped I/O and I/O mapped I/O; key contrasts established.

Page 66: Program Controlled I/O

  • Methodology explanation for program-controlled I/O, polling ideas, CPU register interactions.

Page 67: Typical Program Controlled Instructions

  • Commonly used MNEMONIC commands detailed for program control.

Page 68: Case Study: ARM 5 and ARM 7 Architecture

  • Overview of architecture case study context related to ARM 5 and ARM 7.

Page 69: Data Sizes and Instruction Sets

  • Define ARM as a 32-bit architecture; explanation of terminologies such as byte, halfword and word.

Page 70: ARM Processor Modes Overview

  • List of seven ARM operating modes, their roles, and privilege distinctions discussed.

Page 71: ARM Register Set Overview

  • Details on register access and visibility within varied ARM processor modes.

Page 72: Register Organization Summary

  • Comprehensive summary of active registers across user and privileged modes.

Page 73: ARM Register Counts and Description

  • Details of ARM's 37 registers and their specific functions or governance during mode changes.

Page 74: Program Status Registers

  • Description of condition codes and their relation to ALU operation results discussed with details.

Page 75: Program Counter Functionality

  • Specifies behaviors of Program Counter under different operational states of ARM (ARM, Thumb, Jazelle).

Page 76: ARM Architecture Development Overview

  • Framework of ARM architecture development across various iterations detailed with improvements.

Page 77: ARM Conditional Execution

  • Discusses conditional execution in instructions to enhance code flow efficiency.

Page 78: Condition Codes Explained

  • Break down of various conditions/actions that influence operation outcomes based upon flags.

Page 79: Examples of Conditional Execution in Code

  • Example sequences depicting conditional checks and relevant operations.

Page 80: ARM Branching Instructions

  • Discusses branching mechanisms including relative addressing within ARM instructions.

Page 81: Data Processing Instructions

  • Enumeration of data processing instructions available, their structure, syntax, and operational limits.

Page 82: The Barrel Shifter

  • Describes operations available through the barrel shifter functioning with the ALU for varied data manipulations.

Page 83: Immediate Constants Handling

  • Methods surrounding immediate constants processing within ARM assembly outlined with examples.

Page 84: Pseudo-instructions for Constant Loading

  • Discusses how constants are effectively loaded into registers using assembler capabilities.

Page 85: Multiply Instruction Syntax

  • Syntax data presented for various multiplication instructions within ARM architecture.

Page 86: Single Register Data Transfer Operations

  • Identification of various load/store operations within single register contexts.

Page 87: Addressing Mechanisms in Load/Store

  • Explains how addresses accessed by load/store instructions are specified referencing base registers plus offsets.

Page 88: LDM/STM Operation Syntax

  • Approach towards increment/decrement addressing modes within specified ARM operations illustrated.

Page 89: Pre and Post Indexed Addressing

  • Addresses how register interactions depend on usage of pre and post indexed addressing in ARM architecture.

Page 90: Software Interrupt (SWI) Mechanics

  • Discusses the software interrupt mechanism, its syntax, and operational handling in ARM design.

Page 91: PSR Transfer Instructions Overview

  • Syntax descriptions for transferring CPSR/SPSR contents to general-purpose registers detailed.

Page 92: ARM Branching and Subroutines

  • Describes implementation of branching and subroutine calling mechanisms within ARM assembly.

Page 93: Thumb Architecture Description

  • Explains the 16-bit Thumb architecture, benefits, and performance characteristics in code execution contexts.

Page 94: Example ARM-based System

  • Illustrates a typical ARM-based system including RAM, ROM, and core interrelations and components.

Page 95: AMBA Bridge Overview

  • Description of AMBA (Advanced Microcontroller Bus Architecture) and peripheral bus management framework.