In-Depth Notes on Parallel Computing and Architecture

Architecture and Communication

Classification of Parallel Architectures

  • Flynn's Classification: Categorization of parallel architectures based on instruction and data streams.
    • Category Types:
    • SISD (Single Instruction Single Data): 1 instruction, 1 data set (Conventional and Single-processor sequential machines).
    • SIMD (Single Instruction Multiple Data): 1 instruction, multiple data sets (Vector computers, associative processors).
    • MISD (Multiple Instruction Single Data): multiple sets of instructions, 1 data set (Pipelined processors).
    • MIMD (Multiple Instruction Multiple Data): multiple sets of instructions and data (Decoupled MIMD, shared bus MIMD).

Non-Von Neumann Architectures

Dataflow Architecture
  • Definition: Not Von Neumann or Harvard architecture; ideal for repetitive tasks with similar algorithms.
  • Characteristics: Lacks program counter and allows for data-driven execution.
  • Usage: Common in simulations where data flows cyclically through a process.
  • Implementation:
    1. Program written in a functional language.
    2. Translated into a dataflow graph where nodes are instructions and edges represent data dependencies.
    3. Executes the graph on hardware dedicated to dataflow processing.
Reduction Computer
  • Definition: Discovers potential parallelization autonomously.
  • Functionality: Works with expressions, finds computable parts, and performs reductions using functional representations.
  • Example:
    • Expression: (x*y)+(x-y)
    • Transformation and Reduction Steps:
    1. Transform expression to prefixed notation.
    2. Compute and return results with x=3 and y=2.
Tree Architecture
  • Structure: Utilizes processing nodes for calculations, binary trees for linkage.
  • Calculation Phases:
    1. Distribute nodes into reducible segments.
    2. Execute reduction simultaneously across processors.
    3. Shift memory and manage results.

Von Neumann Architectures

VLIW (Very Long Instruction Word)
  • Nature: SISD architecture with a single program counter, offering long sequences of instructions executed sequentially.
  • Characteristics:
    • Each instruction is sizeable, controls multiple processors.
    • Compiler-generated instructions allow greater performance but face conditional jumps issues.
MISD (Multiple Instruction Single Data)
  • Functionality: Processes a single data set across multiple instructions (e.g., linear processors in pipelines).
SIMD (Single Instruction Multiple Data)
  • Characteristics: Usable for vector processing, offering a single control unit across multiple processors.
  • Memory Access Variants:
    • NUMA (Non-Uniform Memory Access): Each processor has a local memory and slower access to global memory.
    • UMA (Uniform Memory Access): No distinction between local and global memory accesses, creating uniform latencies.

Communication in Parallel Computing

PVM (Parallel Virtual Machine)
  • Definition: A distributed operating system managing process groups across various workstations dynamically.
  • Features:
    • Portability and heterogeneity with processor architecture flexibility.
    • Allows dynamic process management and error recovery.
MPI (Message Passing Interface)
  • Definition: A standard for developing parallel applications across distributed systems.
  • Communication Types:
    • Point-to-point interactions (send/receive).
    • Collective communication operations (broadcast/reduce).
  • Initialization and Finalization:
    • MPI_Init(&argc,&argv);
    • MPI_Finalize();

Interaction and Synchronization

Synchronization
  • Purpose: Ensures correct order and timing of event executions between processes.
  • Mechanisms:
    • Message passing and shared memory access.
  • Notable Algorithms:
    • Semaphore solutions for resource management, preventing race conditions and deadlocks.
Mutual Exclusion
  • Problem: Allowing only one process to enter the critical section at a time.
  • Solutions:
    • Dekker's Algorithm: Guarantees mutual exclusion without starvation.
    • Peterson's Algorithm: Ensures both mutual exclusion and absence of deadlock.

Process Interaction in Parallel Systems

  • Composed of message passing and shared memory techniques to communicate between processes.
  • Distributed systems often favor message passing for inter-process communications.

Parallel Algorithm Analysis

  • Parameters:
    • Number of processors and required processes are specified.
    • Time complexity (t(n)), work (c(n)), and efficiency metrics must be evaluated for optimal performance.
Sorting Algorithms
  • Discusses numerous sorting algorithms like:
    • Enumeration Sort: Excellent for parallel processes but non-optimal in sequential time complexities.
    • Odd-even transposition Sort: Efficient for parallel processing with linear complexity.
    • Merge Sort Variants: Explored as optimal with respect to their operation volumes.
Proofs and Complexity Results
  • Each algorithm must validate its operational complexity (both time and pricing).
  • Examination of algorithmic efficacy across different systems is essential for thorough understanding.

Tree Structures and Traversals

  • Explores the properties of tree structures for algorithmic implementations, including traversals such as Euler and PreOrder.
  • Focuses on effective twiddling of weights and node management in complex data structures for effective tree processing.