Unit V

SRM INSTITUTE OF SCIENCE & TECHNOLOGY

Page 1

Deemed to be University under the UGC Act, 1956
21CSS201T
Computer Organization and Architecture Unit-5.

Page 2: Contents

Parallelism: Need, types, applications, and challenges
Architecture of Parallel Systems - Flynn’s classification
ARM Processor: The thumb instruction set
Processor and CPU cores, Instruction Encoding format
Memory load and Store instruction
Basics of I/O operations
Case study: ARM 5 and ARM 7 Architecture

Page 3: Parallelism

Definition: Executing two or more operations simultaneously.
Purpose: Improve computer performance through parallel processing.
Parallel Computer: Set of processors working together on a problem.
ALUs in CPU can work concurrently for increased throughput.
Multiple processors can operate simultaneously.

Page 4: Goals of Parallelism

Increase Computational Speed: Reduces time to solve problems.
Increase Throughput: More processing in a given timeframe.
Improve Performance: Optimize computer performance at a given clock speed.
Handle Larger Problems: Solve complex computations beyond a single CPU's memory.

Page 5: Applications of Parallelism

Numeric Weather Prediction
Socio-economics
Finite Element Analysis
Artificial Intelligence and Automation
Genetic Engineering
Weapon Research and Defence
Medical Applications
Remote Sensing Applications

Page 6: Applications in Various Fields

Fields such as:
- Aerospace
- Automotive
- Benchmarking
- Biology
- CFD
- Database
- Defense
- Energy
- Environment
- Finance
- Geophysics
- Hardware
- Information Services
- Life Sciences
- Medical
- Media Research
- Software
- Telecommunications
- Weather and Climate Research
- Retail
- Logistics Services

Page 7: Types of Parallelism

Hardware Parallelism
- Increases processing speed via architecture:
  - Processor Parallelism: Multiple CPUs, nodes, cores, threads.
  - Memory Parallelism: Shared/distributed memory, hybrid models (PRAM).
Software Parallelism
- Depends on control and data dependencies of programs, algorithm efficiency, programming style, and compiler optimization.

Page 8: Hardware Parallelism

Characterized by number of instruction issues per machine cycle:
- k-issue Processor: Issues k instructions per cycle.
- Conventional processors: One instruction per cycle.
- Multiprocessor systems can handle nk threads simultaneously.

Page 9: Software Parallelism

Determined by program flow graph; reveals patterns of executable operations.
Varies during execution and limits sustained processor performance.

Page 10: Example - Detection of Parallelism

Consider simple high-level language statements:
- P1: C = D X E
- P2: M = G + C X P1
- P3: A = B + C
- Parallelism detection initiated for instructions P1-P5.

Page 11: Sequential Execution Example

Executes in five steps.
Each high-level statement represents a single instruction.

Page 12: Parallel Execution Example

Reduces execution to three steps if multiple adders are available.
Identifies possible execution pairs without resource conflicts.

Page 13: Mismatch in Software & Hardware Parallelism

Example demonstrating cycles in software parallelism.
Analysis shows varying degrees across cycles.

Page 14: Parallel Execution Using Two-Issue Processor

Executes one load/store and one arithmetic operation simultaneously.
Analysis shows a total of seven cycles needed for execution.

Page 15: Example of Dual-Processor System

Uses single-issue processors to execute instructions.
Six processor cycles required for 12 instructions, considering communication necessities.

Page 16: Software Parallelism Types

Instruction Level Parallelism
Task-Level Parallelism
Data Parallelism
Transaction Level Parallelism

Page 17: Instruction Level Parallelism (ILP)

Measures simultaneous operation capability.
Allows overlapping and reordering of instruction execution.

Page 18: ILP Example

Operations x = a + b, y = c - d, z = x * y.
X and Y can be computed simultaneously while Z depends on both.

Page 19: Performance under ILP

Assumes each operation completes in one unit of time; completion in 2 units for three operations.
ILP factor calculated as 3/2 = 1.5.

Page 20: Data-Level Parallelism (DLP)

Parallelization across multiple processors; distributes data concurrently across nodes.

Page 21: DLP Example

Summing array elements through parallel processing on 4 processors; reduces overall time required for operations.

Page 22: DLP in Adding Elements of Array

Timed execution differences depicted for parallel vs sequential methods.

Page 23: DLP in Matrix Multiplication

Speeds up computational efforts in matrix operations.

Page 24: Locality of Data References

Essential for evaluating DLP performance; linked to memory access behaviors and cache size.

Page 25: Flynn’s Classification

Proposed by Michael J. Flynn in 1966; taxonomy based on instruction and data processing capabilities.

Page 26: Classification Dimensions

Instruction stream: sequence of executed instructions.
Data stream: sequence of data used in instructions.
Classification divides computer architecture into categories based on instruction/data handling.

Page 27: Flynn's Classification Overview

Four categories:
- SISD: Single Instruction Single Data
- SIMD: Single Instruction Multiple Data
- MISD: Multiple Instruction Single Data
- MIMD: Multiple Instruction Multiple Data

Page 28: SISD Characteristics

Single instruction at one time; deterministic execution; only one operand set.

Page 29: SIMD Characteristics

Single instruction executed across multiple data elements simultaneously.

Page 30: MISD Characteristics

Single data stream processed by multiple independent instruction units.

Page 31: MIMD Characteristics

Independent instruction streams operating on potentially diverse data streams; execution modes vary as per task specifics.

Page 32: ARM Architecture Overview

Advanced RISC Machine: Features include)
- 16-bit instruction set: Thumb
- On-chip debugging support, enhanced multiplier, embedded ICE hardware.

Page 33: ARM Features

RISC 32-bit architecture.
High performance, low power, regular register file architecture.
Supports pipelining and simple addressing modes.

Page 34: Detailed ARM Features

Conditional instruction execution, control over ALU & shifter operations, multiple load/store instructions.

Page 35: Thumb Instruction Set

16-bit compressed instruction; provides better code density at performance costs.
Differences outlined between ARM and Thumb instruction use.

Page 36: ARM vs Thumb Performance Analysis

General performance differences outlined between ARM code and Thumb code across 32-bit and 16-bit memory environments.

Page 37: ARM Core Dataflow Model

Shows ARM core as a data instruction functional unit.
ALU functionality and register communication dynamics explained.

Page 38: Instruction Transfer Mechanism

Load/store architecture; interaction between memory and registers.
Specific characteristics regarding ARM's instruction types and register inputs.

Page 39: ALU Functionality Overview

Description of data processing execution in an ARM system, including ALU pre-processing capabilities.

Page 40: ARM Registers Overview

Details on general-purpose registers and task-specific registers in user mode.

Page 41: Processor Modes Confirmation

Overview of ARM processor modes and their access rights.

Page 42: Understanding Processor Modes

Assignments and purposes of varying processor modes, highlighting privileges.

Page 43: Detailed Overview of Processor Modes

Explains how various modes operate and their triggers, from abort modes to user modes.

Page 44: Single-Core Computer Architecture

Basic components of a single-core CPU described with relevant architecture.

Page 45: Reiterated Single-Core CPU Design

Consistent components detailed, emphasizing individual units.

Page 46: Multi-Core Architecture

Description of multiple processor cores organization on a single die.

Page 47: Multi-Core CPU Overview

Details about cores fitting on a single processor socket, identified as CMP.

Page 48: Multi-Core Operation Dynamics

Parallel running of threads on multi-core structures outlined.

Page 49: Time-Slicing within Cores

Processes thread management explained in time-sliced environments.

Page 50: Instruction Encoding Overview

ISA defines instruction formats; encoding created via Opcode and operand fields.

Page 51: MIPS Instruction Encoding

Regularizes instruction encoding under RISC characteristics.

Page 52: I-Type (Immediate) Format

Encoding of immediate instructions in SPIM; examples provided for load operations.

Page 53: Specific I-Type Format Demonstrated

Example demonstrating encoding through Opcode structure in instruction execution.

Page 54: Sign Extension Explanation

Overview of sign extension relevance to ALU operations and compatibility.

Page 55: R-Type (Register) Format

Detailed breakdown of encoding for arithmetic, logical, and compare instructions.

Page 56: R-Type Example

Showcases R-Type instruction encoding for basic subtraction operation.

Page 57: Load and Store Instructions

Description of operation directives for loading and storing information in registers/memory.

Page 58: Addressing in Load/Store Instructions

Explanation of how addresses are computed within Load/Store commands.

Page 59: Examples of Load/Store Instructions

Syntax samples for various load/store operations, including specifics on addressing behaviors.

Page 60: Load-Store Formats Defined

Detailed encoding schemes explained for load/store instruction formats.

Page 61: Loading Constants into Registers

Discussion on pseudo-instructions for managing constants in registers through encoding intricacies.

Page 62: Loading Large Constants

Processes and examples provided for handling large constants in ARM architecture efficiently.

Page 63: Memory Addressing in Assembly

Methods outlined for assembling to place required addresses correctly in registers.

Page 64: Address Generation Methodologies

Discusses assembler mechanics at generating addresses for easy data access/loading.

Page 65: Memory Mapped vs I/O Mapped I/O

Differentiation between memory-mapped I/O and I/O mapped I/O; key contrasts established.

Page 66: Program Controlled I/O

Methodology explanation for program-controlled I/O, polling ideas, CPU register interactions.

Page 67: Typical Program Controlled Instructions

Commonly used MNEMONIC commands detailed for program control.

Page 68: Case Study: ARM 5 and ARM 7 Architecture

Overview of architecture case study context related to ARM 5 and ARM 7.

Page 69: Data Sizes and Instruction Sets

Define ARM as a 32-bit architecture; explanation of terminologies such as byte, halfword and word.

Page 70: ARM Processor Modes Overview

List of seven ARM operating modes, their roles, and privilege distinctions discussed.

Page 71: ARM Register Set Overview

Details on register access and visibility within varied ARM processor modes.

Page 72: Register Organization Summary

Comprehensive summary of active registers across user and privileged modes.

Page 73: ARM Register Counts and Description

Details of ARM's 37 registers and their specific functions or governance during mode changes.

Page 74: Program Status Registers

Description of condition codes and their relation to ALU operation results discussed with details.

Page 75: Program Counter Functionality

Specifies behaviors of Program Counter under different operational states of ARM (ARM, Thumb, Jazelle).

Page 76: ARM Architecture Development Overview

Framework of ARM architecture development across various iterations detailed with improvements.

Page 77: ARM Conditional Execution

Discusses conditional execution in instructions to enhance code flow efficiency.

Page 78: Condition Codes Explained

Break down of various conditions/actions that influence operation outcomes based upon flags.

Page 79: Examples of Conditional Execution in Code

Example sequences depicting conditional checks and relevant operations.

Page 80: ARM Branching Instructions

Discusses branching mechanisms including relative addressing within ARM instructions.

Page 81: Data Processing Instructions

Enumeration of data processing instructions available, their structure, syntax, and operational limits.

Page 82: The Barrel Shifter

Describes operations available through the barrel shifter functioning with the ALU for varied data manipulations.

Page 83: Immediate Constants Handling

Methods surrounding immediate constants processing within ARM assembly outlined with examples.

Page 84: Pseudo-instructions for Constant Loading

Discusses how constants are effectively loaded into registers using assembler capabilities.

Page 85: Multiply Instruction Syntax

Syntax data presented for various multiplication instructions within ARM architecture.

Page 86: Single Register Data Transfer Operations

Identification of various load/store operations within single register contexts.

Page 87: Addressing Mechanisms in Load/Store

Explains how addresses accessed by load/store instructions are specified referencing base registers plus offsets.

Page 88: LDM/STM Operation Syntax

Approach towards increment/decrement addressing modes within specified ARM operations illustrated.

Page 89: Pre and Post Indexed Addressing

Addresses how register interactions depend on usage of pre and post indexed addressing in ARM architecture.

Page 90: Software Interrupt (SWI) Mechanics

Discusses the software interrupt mechanism, its syntax, and operational handling in ARM design.

Page 91: PSR Transfer Instructions Overview

Syntax descriptions for transferring CPSR/SPSR contents to general-purpose registers detailed.

Page 92: ARM Branching and Subroutines

Describes implementation of branching and subroutine calling mechanisms within ARM assembly.

Page 93: Thumb Architecture Description

Explains the 16-bit Thumb architecture, benefits, and performance characteristics in code execution contexts.

Page 94: Example ARM-based System

Illustrates a typical ARM-based system including RAM, ROM, and core interrelations and components.

Page 95: AMBA Bridge Overview

Description of AMBA (Advanced Microcontroller Bus Architecture) and peripheral bus management framework.