In-Depth Notes on Parallel Processing

Parallel Processing

Overview of Parallel Processing

  • Parallel processing refers to a class of techniques enabling simultaneous data-processing tasks, enhancing the computational speed of a computer system.
  • Systems using parallel processing can carry out multiple data-processing tasks at the same time, leading to faster execution times.
  • Achieved by using multiple functional units to perform identical or different operations concurrently, distributing data among these functional units.

Why Parallel Architecture?

  • Enhancement in Performance: Parallel computer architecture improves performance by using more processors.
  • Scalability: Utilizing a greater number of processors generally leads to higher performance than relying on a single processor at a given time.

Functional Units in Parallel Processing

  • Execution units can be separated into up to eight distinct functional units operating in parallel, which include:
    • Adder-Subtractor
    • Integer Multiply
    • Logic Unit
    • Shift Unit
    • Incrementer
    • Floating-point Add-Subtract
    • Floating-point Multiply
    • Floating-point Divide
  • Each unit is independent, indicating true parallel operation.

Challenges in Parallel Processing

  • Communication and Synchronization: The greatest challenges in achieving optimal performance in parallel programs are the communication and synchronization among different subtasks.

Advantages of Parallel Processing

  1. Time and Cost Efficiency: Expedites processes leading to reduced operational time and costs.
  2. Handling Larger Problems: Effectively solves larger problems that can't be managed through serial computing.
  3. Resource Utilization: Benefits from non-local resources when local resources are limited or finite.
  4. Maximizes Hardware Usage: Prevents wasting the available computing power, using hardware more effectively than serial computing.

Types of Parallelism

  1. Bit-Level Parallelism
  2. Instruction-Level Parallelism (ILP)
  3. Task Parallelism
  4. Data-Level Parallelism
Bit-Level Parallelism
  • Involves increasing the size of the processor to improve performance.
  • Reduces the number of necessary instructions for operations on larger data.
  • Example: An 8-bit processor needs two instructions to sum two 16-bit integers, while a 16-bit processor achieves it in one instruction.
Instruction Level Parallelism (ILP)
  • Permits multiple operations to occur simultaneously within a single process using independent resources (address space, registers).
  • Enhances performance by executing operations like memory load/store and arithmetic simultaneously.
  • Reordering of Instructions: Instructions can be reordered to execute concurrently without changing the program's outcome.
  • Example: Given a set of instructions, a sequential processor may take 12 cycles while an ILP-configured processor can execute them in just 4 cycles.
Task Parallelism
  • Breaks down a task into multiple subtasks, each assigned to a processor for concurrent execution.
Data-Level Parallelism (DLP)
  • Aims to enhance data throughput by processing multiple data elements simultaneously.
  • Example: Calculating the average and standard deviation of the same data set concurrently.

Difference Between Parallelism and Pipelining

  • General Parallelism: Refers to executing multiple operations simultaneously.
  • Pipelining: A specific form of parallelism that organizes operations to flow through a series of sub-functions in parallel.
  • Parallelism exploits physical space (e.g., multiple processors) whereas pipelining optimizes timing based on data flows.

Flynn's Classification

  • Flynn's taxonomy provides a framework for classifying parallel computer architectures based on the number of concurrent instruction and data streams.
  • Categories:
    1. (SISD) Single Instruction, Single Data
    2. (SIMD) Single Instruction, Multiple Data
    3. (MISD) Multiple Instruction, Single Data
    4. (MIMD) Multiple Instruction, Multiple Data
SISD
  • Uniprocessor system executing a single instruction on a single data stream sequentially.
  • Generally slows down due to the dependence on internal information transfer rates.
SIMD
  • Multiprocessor system executing the same instruction across multiple CPUs on different data streams.
  • Ideal for scientific computing involving vector and matrix operations.
MISD
  • Multiprocessor system executing different instructions on different processors while operating on the same dataset.
  • Not commonly used commercially or for practical applications.
MIMD
  • Capable of executing multiple instructions across multiple datasets.
  • Each processor has distinct instruction/data streams and operates asynchronously.
Coupling in MIMD
  1. Shared Memory MIMD:
    • Processors are linked to a global memory, accessing and modifying shared data collectively.
  2. Distributed Memory MIMD:
    • Each processor has local memory and communicates through an interconnection network, structured as required.