Notes on Parallel Computing and GPU Architecture
Parallel Computing Concepts
Definition: Parallel computing allows multiple calculations or processes to be carried out simultaneously, dividing large problems into manageable chunks.
Classification of Parallel Computers
Flynn's Classification (1972):
Based on data and instruction flow.
Coupling Types:
Loosely Coupled: Independent servers connected via a network (e.g., Ethernet) communicating through message passing.
Tightly Coupled: PEs (Processing Elements) share a common main memory, leading to faster communication.
Memory Access Modes
Shared Memory (SM):
All processors access a common global address space.
Access time for memory is consistent across processors (Uniform Memory Access - UMA).
Distributed Shared Memory (DSM):
Local memory access is faster than remote memory.
Characterized by Non-Uniform Memory Access (NUMA). Accessing remote memory can be significantly slower (10 - 1000 times).
Grain Size Classification
Definitions: Refers to the quantum of work a PE does before needing to communicate with another PE.
Types:
Very Fine Grain: Single instruction.
Fine Grain: Corresponds to threads, typically 100 machine instructions.
Medium Grain: Corresponds to procedures (subroutines), typically around 1000 instructions.
Coarse Grain: Corresponds to complete programs.
GPUs in Parallel Computing
Definition: A GPU is an electronic circuit optimized for rendering graphics and performing parallel computations.
Key Features:
Able to handle large datasets and parallel processing, used in AI, gaming, and simulations.
Trajectories in Microprocessor Design:
Multicore: Focuses on sequential programs with a few cores.
Manycore: Focuses on throughput with many threads and smaller cores.
Heterogeneous Parallel Computing
Design Philosophy:
CPUs optimize for sequential task performance, whereas GPUs are designed for parallel execution.
CPUs utilize large caches for low-latency operations, while GPUs prioritize throughput for many parallel threads.
Performance Comparison:
Many applications benefit from both CPU (for sequential tasks) and GPU (for parallel tasks) processing capabilities.
GPU Architecture
Components:
Organized into multiple streaming multiprocessors (SMs) with their own streaming processors (SPs).
Includes high bandwidth memory, such as GDDR, for efficient data processing.
CUDA Architecture:
Introduces a data-parallel computation model that allows easy access to GPU capabilities without the graphics API.
/
GPU Computing Advantages
Parallelism: Thousands of cores enable simultaneous task execution.
High Performance: Suitable for large datasets and computational tasks.
Efficiency: Lower power consumption compared to CPUs for parallel workloads.
Cost-Effectiveness: Generally more affordable than large CPU clusters.
Applications of GPU Computing
Fields Include:
AI & Machine Learning: Efficient training of deep learning models.
Scientific Computing: Simulations in climate modeling, and physics.
Cryptocurrency Mining: Solving cryptographic puzzles efficiently.
3D Rendering & Gaming: Real-time graphics processing.
Finance & Data Analytics: High-frequency trading and risk modeling.
Medical Imaging: Processing images from MRI and CT scans.