Parallel Design Patterns

Design Patterns

  • A design pattern represents a common programming problem and its tested, efficient solution.

Parallel Design Pattern Design Spaces

  • Partitioning of parallel application design into four related design spaces:
    • Finding concurrency
    • Algorithm structure
    • Supporting structures
    • Implementation mechanisms

Finding Concurrency

  • Focuses on structuring the problem to expose concurrency.
  • Programmer identifies areas of potential concurrency in a problem or serial program.

Three Dimensions

  • Decomposition and Dependency Analysis
    • Related to how the parallel application will be implemented.
Decomposition
  • Dividing the problem into concurrently executable pieces.
    • Task Decomposition
    • Views a complex algorithm as a set of instructions grouped into concurrently executable tasks.
    • Data Decomposition
    • Divides program data into chunks for use by each task.
Dependency Analysis
  • Grouping tasks and analyzing dependencies between them.
    • Group Tasks
    • Models convenient task groupings to simplify dependency management.
    • Order Tasks
    • Determines task/group order to satisfy application constraints.
    • Data Sharing
    • Models accesses to shared data structures.
Design Evaluation
  • Guides analysis of progress before moving to algorithm structure patterns.
Overall Output
  1. Task decomposition identifying concurrent tasks.
  2. Data decomposition identifying data local to each task.
  3. Task grouping and ordering to satisfy temporal and data dependencies.
  4. Analysis of dependencies among tasks.

Algorithm Structure

  • Refines concurrent task design to create a parallel program structure for a parallel target architecture.

Task Decomposition

  • Task Parallelism (linear decomposition)
    • Efficient execution of task collections.
    • Decomposes the problem into concurrently executable tasks with potential dependencies.
    • Solution Implementation:
    • How tasks are defined.
    • Dependencies among tasks.
    • Scheduling tasks for concurrent execution, including assignment to processors or threads.

Divide & Conquer

  • Divides a problem into smaller, identical sub-problems.
  • Solves sub-problems and combines solutions into a final solution.

Data Decomposition

  • Geometric Decomposition (linear)
    • Distributes data into discrete datasets.
    • Solves the problem by operating on each dataset independently, allowing concurrent execution.
  • Recursive Data
    • Data is acted upon sequentially, often using links (e.g., linked lists, binary trees, graphs).
    • Restructuring computations over the linked data structure exposes more concurrency.

Organize by Flow of Data

  • Uses the flow of data to impose an ordering on tasks.
Pipeline
  • One-way, consistent data flow through a linear chain of stages.
  • Each stage represents a function computed on input data from the previous stage, with the result delivered to the next stage.
Event-Based Coordination
  • Semi-independent concurrent activities interact irregularly.
  • Interactions determined by data flow between concurrent activities.
  • Data flow is not necessarily one-way or linear.
Note
  • The three ways of organizing the parallel algorithm are alternatives.
  • Unlike finding concurrency, the programmer chooses one of the three alternatives and exploits one of the parallel design patterns in the group.

Supporting Structures

  • Investigates structures/patterns to support the implementation of algorithms planned in the algorithm structure design space.

Program Structures

  • Single Program, Multiple Data (SPMD)
    • All processing elements (PEs) run the same program in parallel, each with its own data subset.
  • Master/Worker
    • A Master task sets up concurrent Worker threads/processes and a bag of tasks.
  • Loop Parallelism
    • Executes compute-intensive loop iterations in parallel.
  • Fork/Join
    • A thread/process forks sub-processes that execute in parallel.

Data Structures

  • Shared Data
    • Manages data shared among concurrent activities.
  • Shared Queue
    • Creates queue data types accessible concurrently.
    • Supports interaction of concurrent activities in different contexts.
  • Distributed Array
    • Manages arrays partitioned and distributed among concurrent activities.

Implementation Mechanism

  • Meta-patterns representing base mechanisms to support parallel computing abstractions.
    • Concurrent activities
    • Synchronization
    • Communication

UE Management

  • Manages units of execution, including creation, destruction, and management of concurrent activities.

Synchronization

  • Handles ordering of events/computations in the UEs used to execute the parallel application.

Communication

  • Manages communication between UEs, including data exchange among concurrent activities.

List of Parallel Patterns

Embarrassingly Parallel

  • A class of problems where work division into independent tasks is obvious and simple.
  • Examples: Trapezoid rule program, loop accumulation with multiplication or addition.

Master/Worker

  • A Master task sets up concurrent Worker threads/processes and a bag of tasks.
  • Workers take tasks from the bag and execute them in parallel until the bag is empty or another ending condition occurs.

Map and Reduce

  • Simplest pattern.
Map
  • Applies a function to every data element in parallel.
  • Functions must have no side effects and must be identical and independent.

MapReduce

  • Solves problems involving input, processing, and generation of large datasets, scalable across many processors.
Mapping
  • Divides a large dataset into N discrete subsets, each processed on a single processor; output is typically a map data structure
Shuffle
  • Extracts similar pairs and assigns them to a new processor where the reduce operation will happen.
Reduce
  • Combines elements in an input dataset into a single result.

Divide & Conquer

  • Smaller versions of the problem are solved independently.
  • The larger solution depends on the smaller solutions.
  • Can be broken down into smaller sub-problems, each solved independently.
  • Partial solutions are combined into a solution for the larger problem.

Fork/Join

  • The number of parallel threads varies as the program executes.
  • Forks off a different number of threads at different times and waits for them to finish before proceeding.
  • The originating process waits until the child processes all join before resuming its own execution.
  • Has the possibility of having nested parallel execution.

A Last Word on Parallel Design Patterns

  • Parallel design patterns can be very similar because they typically follow the process of creating a parallel algorithm/program from a corresponding serial algorithm/program.
  • The process is to:
    • Identify concurrency.
    • Split the program into concurrent pieces.
    • Split the data if the updated algorithm calls for it.
    • Execute the concurrent pieces in parallel.
    • Put all the answers back together to make a final answer.