In-Depth Notes on Parallel Processing

Parallel processing refers to a class of techniques enabling simultaneous data-processing tasks, enhancing the computational speed of a computer system.
Systems using parallel processing can carry out multiple data-processing tasks at the same time, leading to faster execution times.
Achieved by using multiple functional units to perform identical or different operations concurrently, distributing data among these functional units.

Enhancement in Performance: Parallel computer architecture improves performance by using more processors.
Scalability: Utilizing a greater number of processors generally leads to higher performance than relying on a single processor at a given time.

Communication and Synchronization: The greatest challenges in achieving optimal performance in parallel programs are the communication and synchronization among different subtasks.

Time and Cost Efficiency: Expedites processes leading to reduced operational time and costs.
Handling Larger Problems: Effectively solves larger problems that can't be managed through serial computing.
Resource Utilization: Benefits from non-local resources when local resources are limited or finite.
Maximizes Hardware Usage: Prevents wasting the available computing power, using hardware more effectively than serial computing.

Involves increasing the size of the processor to improve performance.
Reduces the number of necessary instructions for operations on larger data.
Example: An 8-bit processor needs two instructions to sum two 16-bit integers, while a 16-bit processor achieves it in one instruction.

Permits multiple operations to occur simultaneously within a single process using independent resources (address space, registers).
Enhances performance by executing operations like memory load/store and arithmetic simultaneously.
Reordering of Instructions: Instructions can be reordered to execute concurrently without changing the program's outcome.
Example: Given a set of instructions, a sequential processor may take 12 cycles while an ILP-configured processor can execute them in just 4 cycles.

Breaks down a task into multiple subtasks, each assigned to a processor for concurrent execution.

Aims to enhance data throughput by processing multiple data elements simultaneously.
Example: Calculating the average and standard deviation of the same data set concurrently.

General Parallelism: Refers to executing multiple operations simultaneously.
Pipelining: A specific form of parallelism that organizes operations to flow through a series of sub-functions in parallel.
Parallelism exploits physical space (e.g., multiple processors) whereas pipelining optimizes timing based on data flows.

Flynn's taxonomy provides a framework for classifying parallel computer architectures based on the number of concurrent instruction and data streams.
Categories:
1. (SISD) Single Instruction, Single Data
2. (SIMD) Single Instruction, Multiple Data
3. (MISD) Multiple Instruction, Single Data
4. (MIMD) Multiple Instruction, Multiple Data

Uniprocessor system executing a single instruction on a single data stream sequentially.
Generally slows down due to the dependence on internal information transfer rates.

Multiprocessor system executing the same instruction across multiple CPUs on different data streams.
Ideal for scientific computing involving vector and matrix operations.

Multiprocessor system executing different instructions on different processors while operating on the same dataset.
Not commonly used commercially or for practical applications.

Capable of executing multiple instructions across multiple datasets.
Each processor has distinct instruction/data streams and operates asynchronously.

Shared Memory MIMD:
- Processors are linked to a global memory, accessing and modifying shared data collectively.
Distributed Memory MIMD:
- Each processor has local memory and communicates through an interconnection network, structured as required.