Notes on Parallel Computing and GPU Architecture

Parallel Computing Concepts

  • Definition: Parallel computing allows multiple calculations or processes to be carried out simultaneously, dividing large problems into manageable chunks.

Classification of Parallel Computers

  • Flynn's Classification (1972):

    • Based on data and instruction flow.

    • Coupling Types:

    • Loosely Coupled: Independent servers connected via a network (e.g., Ethernet) communicating through message passing.

    • Tightly Coupled: PEs (Processing Elements) share a common main memory, leading to faster communication.

Memory Access Modes

  • Shared Memory (SM):

    • All processors access a common global address space.

    • Access time for memory is consistent across processors (Uniform Memory Access - UMA).

  • Distributed Shared Memory (DSM):

    • Local memory access is faster than remote memory.

    • Characterized by Non-Uniform Memory Access (NUMA). Accessing remote memory can be significantly slower (10 - 1000 times).

Grain Size Classification

  • Definitions: Refers to the quantum of work a PE does before needing to communicate with another PE.

  • Types:

    • Very Fine Grain: Single instruction.

    • Fine Grain: Corresponds to threads, typically 100 machine instructions.

    • Medium Grain: Corresponds to procedures (subroutines), typically around 1000 instructions.

    • Coarse Grain: Corresponds to complete programs.

GPUs in Parallel Computing

  • Definition: A GPU is an electronic circuit optimized for rendering graphics and performing parallel computations.

  • Key Features:

    • Able to handle large datasets and parallel processing, used in AI, gaming, and simulations.

    • Trajectories in Microprocessor Design:

    • Multicore: Focuses on sequential programs with a few cores.

    • Manycore: Focuses on throughput with many threads and smaller cores.

Heterogeneous Parallel Computing

  • Design Philosophy:

    • CPUs optimize for sequential task performance, whereas GPUs are designed for parallel execution.

    • CPUs utilize large caches for low-latency operations, while GPUs prioritize throughput for many parallel threads.

  • Performance Comparison:

    • Many applications benefit from both CPU (for sequential tasks) and GPU (for parallel tasks) processing capabilities.

GPU Architecture

  • Components:

    • Organized into multiple streaming multiprocessors (SMs) with their own streaming processors (SPs).

    • Includes high bandwidth memory, such as GDDR, for efficient data processing.

  • CUDA Architecture:

    • Introduces a data-parallel computation model that allows easy access to GPU capabilities without the graphics API.

/

GPU Computing Advantages

  • Parallelism: Thousands of cores enable simultaneous task execution.

  • High Performance: Suitable for large datasets and computational tasks.

  • Efficiency: Lower power consumption compared to CPUs for parallel workloads.

  • Cost-Effectiveness: Generally more affordable than large CPU clusters.

Applications of GPU Computing

  • Fields Include:

    • AI & Machine Learning: Efficient training of deep learning models.

    • Scientific Computing: Simulations in climate modeling, and physics.

    • Cryptocurrency Mining: Solving cryptographic puzzles efficiently.

    • 3D Rendering & Gaming: Real-time graphics processing.

    • Finance & Data Analytics: High-frequency trading and risk modeling.

    • Medical Imaging: Processing images from MRI and CT scans.