Introduction to Parallel and Distributed Computing

Parallel Computing: Multiple processors execute tasks simultaneously. Memory can be shared or distributed. It focuses on concurrency, performance, and efficiency.
Distributed Computing: A collection of autonomous computers that appear to the user as a single system. It lacks shared memory and relies on message passing for communication. It is primarily used for scalability, fault tolerance, and resource sharing across different geographical locations.

Location: Parallel computing usually occurs on a single computer with multiple processors; Distributed computing involves components at different physical locations.
Communication: Parallel processors communicate via a bus; Distributed computers communicate via message passing.
Memory: Parallel systems may use shared or distributed memory; Distributed systems use only distributed memory.
Goal: Parallel computing aims for high-speed processing and efficiency; Distributed computing prioritizes scalability and resource sharing.

Serial Computing: A problem is broken into a discrete series of instructions executed sequentially on a single processor. Only one instruction executes at any given moment.
Parallel Computing: A problem is broken into discrete parts that are solved concurrently. Each part is divided into instructions that execute simultaneously on different processors under a coordination mechanism.

SISD (Single Instruction, Single Data): A serial (non-parallel) computer where one instruction acts on one data stream per clock cycle.
SIMD (Single Instruction, Multiple Data): All processing units execute the same instruction on different data elements. This is common in GPUs and image processing.
MISD (Multiple Instruction, Single Data): Each processing unit operates on a single data stream independently via separate instructions. This is a rare architecture with few real-world examples.
MIMD (Multiple Instruction, Multiple Data): Every processor executes different instruction streams on different data streams. This is the most common modern architecture for supercomputers and clusters.

Amdahl's Law: This law defines the potential speedup of a program based on the fraction of code ( $P$ ) that can be parallelized: $\text{Speedup} = \frac{1}{(1-P) + \frac{P}{N}}$ where $N$ is the number of processors.
Limits to Scalability: As $N$ increases, the serial fraction of the code ( $1-P$ ) becomes the dominant bottleneck, limiting the maximum possible speedup.
Complexity: Parallel applications are more complex to design, code, debug, and maintain compared to serial applications.

Strong Scaling (Amdahl): The total problem size remains fixed as more processors are added. The goal is to solve the same problem faster.
Weak Scaling (Gustafson): The problem size per processor remains fixed as more processors are added. The goal is to solve a larger problem in the same amount of time.