Notes on Parallel Computing and GPU Architecture

Parallel Computing Concepts

Definition: Parallel computing allows multiple calculations or processes to be carried out simultaneously, dividing large problems into manageable chunks.

Classification of Parallel Computers

Flynn's Classification (1972):
- Based on data and instruction flow.
- Coupling Types:
- Loosely Coupled: Independent servers connected via a network (e.g., Ethernet) communicating through message passing.
- Tightly Coupled: PEs (Processing Elements) share a common main memory, leading to faster communication.

Memory Access Modes

Shared Memory (SM):
- All processors access a common global address space.
- Access time for memory is consistent across processors (Uniform Memory Access - UMA).
Distributed Shared Memory (DSM):
- Local memory access is faster than remote memory.
- Characterized by Non-Uniform Memory Access (NUMA). Accessing remote memory can be significantly slower (10 - 1000 times).

Grain Size Classification

Definitions: Refers to the quantum of work a PE does before needing to communicate with another PE.
Types:
- Very Fine Grain: Single instruction.
- Fine Grain: Corresponds to threads, typically 100 machine instructions.
- Medium Grain: Corresponds to procedures (subroutines), typically around 1000 instructions.
- Coarse Grain: Corresponds to complete programs.

GPUs in Parallel Computing

Definition: A GPU is an electronic circuit optimized for rendering graphics and performing parallel computations.
Key Features:
- Able to handle large datasets and parallel processing, used in AI, gaming, and simulations.
- Trajectories in Microprocessor Design:
- Multicore: Focuses on sequential programs with a few cores.
- Manycore: Focuses on throughput with many threads and smaller cores.

Heterogeneous Parallel Computing

Design Philosophy:
- CPUs optimize for sequential task performance, whereas GPUs are designed for parallel execution.
- CPUs utilize large caches for low-latency operations, while GPUs prioritize throughput for many parallel threads.
Performance Comparison:
- Many applications benefit from both CPU (for sequential tasks) and GPU (for parallel tasks) processing capabilities.

GPU Architecture

Components:
- Organized into multiple streaming multiprocessors (SMs) with their own streaming processors (SPs).
- Includes high bandwidth memory, such as GDDR, for efficient data processing.
CUDA Architecture:
- Introduces a data-parallel computation model that allows easy access to GPU capabilities without the graphics API.

/

GPU Computing Advantages

Parallelism: Thousands of cores enable simultaneous task execution.
High Performance: Suitable for large datasets and computational tasks.
Efficiency: Lower power consumption compared to CPUs for parallel workloads.
Cost-Effectiveness: Generally more affordable than large CPU clusters.

Applications of GPU Computing

Fields Include:
- AI & Machine Learning: Efficient training of deep learning models.
- Scientific Computing: Simulations in climate modeling, and physics.
- Cryptocurrency Mining: Solving cryptographic puzzles efficiently.
- 3D Rendering & Gaming: Real-time graphics processing.
- Finance & Data Analytics: High-frequency trading and risk modeling.
- Medical Imaging: Processing images from MRI and CT scans.