In-Depth Notes on Parallel Computing and Architecture
Architecture and Communication
Classification of Parallel Architectures
- Flynn's Classification: Categorization of parallel architectures based on instruction and data streams.
- Category Types:
- SISD (Single Instruction Single Data): 1 instruction, 1 data set (Conventional and Single-processor sequential machines).
- SIMD (Single Instruction Multiple Data): 1 instruction, multiple data sets (Vector computers, associative processors).
- MISD (Multiple Instruction Single Data): multiple sets of instructions, 1 data set (Pipelined processors).
- MIMD (Multiple Instruction Multiple Data): multiple sets of instructions and data (Decoupled MIMD, shared bus MIMD).
Non-Von Neumann Architectures
Dataflow Architecture
- Definition: Not Von Neumann or Harvard architecture; ideal for repetitive tasks with similar algorithms.
- Characteristics: Lacks program counter and allows for data-driven execution.
- Usage: Common in simulations where data flows cyclically through a process.
- Implementation:
- Program written in a functional language.
- Translated into a dataflow graph where nodes are instructions and edges represent data dependencies.
- Executes the graph on hardware dedicated to dataflow processing.
Reduction Computer
- Definition: Discovers potential parallelization autonomously.
- Functionality: Works with expressions, finds computable parts, and performs reductions using functional representations.
- Example:
- Expression: (x*y)+(x-y)
- Transformation and Reduction Steps:
- Transform expression to prefixed notation.
- Compute and return results with x=3 and y=2.
Tree Architecture
- Structure: Utilizes processing nodes for calculations, binary trees for linkage.
- Calculation Phases:
- Distribute nodes into reducible segments.
- Execute reduction simultaneously across processors.
- Shift memory and manage results.
Von Neumann Architectures
VLIW (Very Long Instruction Word)
- Nature: SISD architecture with a single program counter, offering long sequences of instructions executed sequentially.
- Characteristics:
- Each instruction is sizeable, controls multiple processors.
- Compiler-generated instructions allow greater performance but face conditional jumps issues.
MISD (Multiple Instruction Single Data)
- Functionality: Processes a single data set across multiple instructions (e.g., linear processors in pipelines).
SIMD (Single Instruction Multiple Data)
- Characteristics: Usable for vector processing, offering a single control unit across multiple processors.
- Memory Access Variants:
- NUMA (Non-Uniform Memory Access): Each processor has a local memory and slower access to global memory.
- UMA (Uniform Memory Access): No distinction between local and global memory accesses, creating uniform latencies.
Communication in Parallel Computing
PVM (Parallel Virtual Machine)
- Definition: A distributed operating system managing process groups across various workstations dynamically.
- Features:
- Portability and heterogeneity with processor architecture flexibility.
- Allows dynamic process management and error recovery.
MPI (Message Passing Interface)
- Definition: A standard for developing parallel applications across distributed systems.
- Communication Types:
- Point-to-point interactions (send/receive).
- Collective communication operations (broadcast/reduce).
- Initialization and Finalization:
MPI_Init(&argc,&argv);MPI_Finalize();
Interaction and Synchronization
Synchronization
- Purpose: Ensures correct order and timing of event executions between processes.
- Mechanisms:
- Message passing and shared memory access.
- Notable Algorithms:
- Semaphore solutions for resource management, preventing race conditions and deadlocks.
Mutual Exclusion
- Problem: Allowing only one process to enter the critical section at a time.
- Solutions:
- Dekker's Algorithm: Guarantees mutual exclusion without starvation.
- Peterson's Algorithm: Ensures both mutual exclusion and absence of deadlock.
Process Interaction in Parallel Systems
- Composed of message passing and shared memory techniques to communicate between processes.
- Distributed systems often favor message passing for inter-process communications.
Parallel Algorithm Analysis
- Parameters:
- Number of processors and required processes are specified.
- Time complexity (t(n)), work (c(n)), and efficiency metrics must be evaluated for optimal performance.
Sorting Algorithms
- Discusses numerous sorting algorithms like:
- Enumeration Sort: Excellent for parallel processes but non-optimal in sequential time complexities.
- Odd-even transposition Sort: Efficient for parallel processing with linear complexity.
- Merge Sort Variants: Explored as optimal with respect to their operation volumes.
Proofs and Complexity Results
- Each algorithm must validate its operational complexity (both time and pricing).
- Examination of algorithmic efficacy across different systems is essential for thorough understanding.
Tree Structures and Traversals
- Explores the properties of tree structures for algorithmic implementations, including traversals such as Euler and PreOrder.
- Focuses on effective twiddling of weights and node management in complex data structures for effective tree processing.