In-Depth Notes on Parallel Processing
Parallel Processing
Overview of Parallel Processing
- Parallel processing refers to a class of techniques enabling simultaneous data-processing tasks, enhancing the computational speed of a computer system.
- Systems using parallel processing can carry out multiple data-processing tasks at the same time, leading to faster execution times.
- Achieved by using multiple functional units to perform identical or different operations concurrently, distributing data among these functional units.
Why Parallel Architecture?
- Enhancement in Performance: Parallel computer architecture improves performance by using more processors.
- Scalability: Utilizing a greater number of processors generally leads to higher performance than relying on a single processor at a given time.
Functional Units in Parallel Processing
- Execution units can be separated into up to eight distinct functional units operating in parallel, which include:
- Adder-Subtractor
- Integer Multiply
- Logic Unit
- Shift Unit
- Incrementer
- Floating-point Add-Subtract
- Floating-point Multiply
- Floating-point Divide
- Each unit is independent, indicating true parallel operation.
Challenges in Parallel Processing
- Communication and Synchronization: The greatest challenges in achieving optimal performance in parallel programs are the communication and synchronization among different subtasks.
Advantages of Parallel Processing
- Time and Cost Efficiency: Expedites processes leading to reduced operational time and costs.
- Handling Larger Problems: Effectively solves larger problems that can't be managed through serial computing.
- Resource Utilization: Benefits from non-local resources when local resources are limited or finite.
- Maximizes Hardware Usage: Prevents wasting the available computing power, using hardware more effectively than serial computing.
Types of Parallelism
- Bit-Level Parallelism
- Instruction-Level Parallelism (ILP)
- Task Parallelism
- Data-Level Parallelism
Bit-Level Parallelism
- Involves increasing the size of the processor to improve performance.
- Reduces the number of necessary instructions for operations on larger data.
- Example: An 8-bit processor needs two instructions to sum two 16-bit integers, while a 16-bit processor achieves it in one instruction.
Instruction Level Parallelism (ILP)
- Permits multiple operations to occur simultaneously within a single process using independent resources (address space, registers).
- Enhances performance by executing operations like memory load/store and arithmetic simultaneously.
- Reordering of Instructions: Instructions can be reordered to execute concurrently without changing the program's outcome.
- Example: Given a set of instructions, a sequential processor may take 12 cycles while an ILP-configured processor can execute them in just 4 cycles.
Task Parallelism
- Breaks down a task into multiple subtasks, each assigned to a processor for concurrent execution.
Data-Level Parallelism (DLP)
- Aims to enhance data throughput by processing multiple data elements simultaneously.
- Example: Calculating the average and standard deviation of the same data set concurrently.
Difference Between Parallelism and Pipelining
- General Parallelism: Refers to executing multiple operations simultaneously.
- Pipelining: A specific form of parallelism that organizes operations to flow through a series of sub-functions in parallel.
- Parallelism exploits physical space (e.g., multiple processors) whereas pipelining optimizes timing based on data flows.
Flynn's Classification
- Flynn's taxonomy provides a framework for classifying parallel computer architectures based on the number of concurrent instruction and data streams.
- Categories:
- (SISD) Single Instruction, Single Data
- (SIMD) Single Instruction, Multiple Data
- (MISD) Multiple Instruction, Single Data
- (MIMD) Multiple Instruction, Multiple Data
SISD
- Uniprocessor system executing a single instruction on a single data stream sequentially.
- Generally slows down due to the dependence on internal information transfer rates.
SIMD
- Multiprocessor system executing the same instruction across multiple CPUs on different data streams.
- Ideal for scientific computing involving vector and matrix operations.
MISD
- Multiprocessor system executing different instructions on different processors while operating on the same dataset.
- Not commonly used commercially or for practical applications.
MIMD
- Capable of executing multiple instructions across multiple datasets.
- Each processor has distinct instruction/data streams and operates asynchronously.
Coupling in MIMD
- Shared Memory MIMD:
- Processors are linked to a global memory, accessing and modifying shared data collectively.
- Distributed Memory MIMD:
- Each processor has local memory and communicates through an interconnection network, structured as required.