Parallel Design Patterns

Design Patterns

A design pattern represents a common programming problem and its tested, efficient solution.

Parallel Design Pattern Design Spaces

Partitioning of parallel application design into four related design spaces:
- Finding concurrency
- Algorithm structure
- Supporting structures
- Implementation mechanisms

Finding Concurrency

Focuses on structuring the problem to expose concurrency.
Programmer identifies areas of potential concurrency in a problem or serial program.

Three Dimensions

Decomposition and Dependency Analysis
- Related to how the parallel application will be implemented.

Decomposition

Dividing the problem into concurrently executable pieces.
- Task Decomposition
- Views a complex algorithm as a set of instructions grouped into concurrently executable tasks.
- Data Decomposition
- Divides program data into chunks for use by each task.

Dependency Analysis

Grouping tasks and analyzing dependencies between them.
- Group Tasks
- Models convenient task groupings to simplify dependency management.
- Order Tasks
- Determines task/group order to satisfy application constraints.
- Data Sharing
- Models accesses to shared data structures.

Design Evaluation

Guides analysis of progress before moving to algorithm structure patterns.

Overall Output

Task decomposition identifying concurrent tasks.
Data decomposition identifying data local to each task.
Task grouping and ordering to satisfy temporal and data dependencies.
Analysis of dependencies among tasks.

Algorithm Structure

Refines concurrent task design to create a parallel program structure for a parallel target architecture.

Task Decomposition

Task Parallelism (linear decomposition)
- Efficient execution of task collections.
- Decomposes the problem into concurrently executable tasks with potential dependencies.
- Solution Implementation:
- How tasks are defined.
- Dependencies among tasks.
- Scheduling tasks for concurrent execution, including assignment to processors or threads.

Divide & Conquer

Divides a problem into smaller, identical sub-problems.
Solves sub-problems and combines solutions into a final solution.

Data Decomposition

Geometric Decomposition (linear)
- Distributes data into discrete datasets.
- Solves the problem by operating on each dataset independently, allowing concurrent execution.
Recursive Data
- Data is acted upon sequentially, often using links (e.g., linked lists, binary trees, graphs).
- Restructuring computations over the linked data structure exposes more concurrency.

Organize by Flow of Data

Uses the flow of data to impose an ordering on tasks.

Pipeline

One-way, consistent data flow through a linear chain of stages.
Each stage represents a function computed on input data from the previous stage, with the result delivered to the next stage.

Event-Based Coordination

Semi-independent concurrent activities interact irregularly.
Interactions determined by data flow between concurrent activities.
Data flow is not necessarily one-way or linear.

Note

The three ways of organizing the parallel algorithm are alternatives.
Unlike finding concurrency, the programmer chooses one of the three alternatives and exploits one of the parallel design patterns in the group.

Supporting Structures

Investigates structures/patterns to support the implementation of algorithms planned in the algorithm structure design space.

Program Structures

Single Program, Multiple Data (SPMD)
- All processing elements (PEs) run the same program in parallel, each with its own data subset.
Master/Worker
- A Master task sets up concurrent Worker threads/processes and a bag of tasks.
Loop Parallelism
- Executes compute-intensive loop iterations in parallel.
Fork/Join
- A thread/process forks sub-processes that execute in parallel.

Data Structures

Shared Data
- Manages data shared among concurrent activities.
Shared Queue
- Creates queue data types accessible concurrently.
- Supports interaction of concurrent activities in different contexts.
Distributed Array
- Manages arrays partitioned and distributed among concurrent activities.

Implementation Mechanism

Meta-patterns representing base mechanisms to support parallel computing abstractions.
- Concurrent activities
- Synchronization
- Communication

UE Management

Manages units of execution, including creation, destruction, and management of concurrent activities.

Synchronization

Handles ordering of events/computations in the UEs used to execute the parallel application.

Communication

Manages communication between UEs, including data exchange among concurrent activities.

List of Parallel Patterns

Embarrassingly Parallel

A class of problems where work division into independent tasks is obvious and simple.
Examples: Trapezoid rule program, loop accumulation with multiplication or addition.

Master/Worker

A Master task sets up concurrent Worker threads/processes and a bag of tasks.
Workers take tasks from the bag and execute them in parallel until the bag is empty or another ending condition occurs.

Map and Reduce

Simplest pattern.

Map

Applies a function to every data element in parallel.
Functions must have no side effects and must be identical and independent.

MapReduce

Solves problems involving input, processing, and generation of large datasets, scalable across many processors.

Mapping

Divides a large dataset into N discrete subsets, each processed on a single processor; output is typically a map data structure

Shuffle

Extracts similar pairs and assigns them to a new processor where the reduce operation will happen.

Reduce

Combines elements in an input dataset into a single result.

Divide & Conquer

Smaller versions of the problem are solved independently.
The larger solution depends on the smaller solutions.
Can be broken down into smaller sub-problems, each solved independently.
Partial solutions are combined into a solution for the larger problem.

Fork/Join

The number of parallel threads varies as the program executes.
Forks off a different number of threads at different times and waits for them to finish before proceeding.
The originating process waits until the child processes all join before resuming its own execution.
Has the possibility of having nested parallel execution.

A Last Word on Parallel Design Patterns

Parallel design patterns can be very similar because they typically follow the process of creating a parallel algorithm/program from a corresponding serial algorithm/program.
The process is to:
- Identify concurrency.
- Split the program into concurrent pieces.
- Split the data if the updated algorithm calls for it.
- Execute the concurrent pieces in parallel.
- Put all the answers back together to make a final answer.