3.2.4-OpenMP---History-and-Parallel-Loops

History of OpenMP

  • Development: Automatic parallelization research began at the University of Illinois under David Kuck, producing insights but facing implementation challenges.

  • Early 1980s: Introduction of shared memory multiprocessors.

  • Parallel Computing Forum: Formed by Kuck to develop Fortran extensions for parallel programming constructs.

  • 1997: Establishment of the OpenMP standard, which standardized shared-address-space programming practices over 15 years.

OpenMP Philosophy

  • Utilization of Pragmas: OpenMP uses pragmas as compiler hints.

    • Example syntax: #pragma omp <directive>.

    • Ignoring pragmas results in correct sequential execution, indicating they guide compilers without strict enforcement.

    • Emphasizes programmer trust while the compiler automates execution.

OpenMP Directives and Examples

  • Requirements: Include #include <omp.h>; compile with OpenMP options (-fopenmp for GCC, -openmp for Intel compilers).

  • Proper vs Improper Parallel Loops:

    • Valid: #pragma omp parallel for for (i = 0; i < N; i++) A[i] = i * i * 0.23;

    • Invalid: #pragma omp parallel for for (i = 1; i < N; i++) A[i] = A[i-1] + A[i+1;} due to data dependencies.

Criteria for Loop Parallelization

  • A loop can be parallelized if its parallel execution yields the same result as sequential execution (small floating point discrepancies are acceptable).

Floating Point Representation and Issues

  • Floating point handles wide dynamic ranges using fixed bits, expressed as Value = 1.mantissa x 2^(exponent-127).

  • Issues: Precision loss when adding numbers in different orders; non-associativity complicates equality comparisons.

  • Use thresholds for proximity comparisons, e.g., if abs(x-y) < THRESHOLD.

Simple Rules for Parallelization

  • A loop can be parallelized if:

    • It reads from certain arrays while writing to others without overlap.

    • Each iteration modifies a distinct array segment.

  • Examples:

    • #pragma omp parallel for for (i = 1; i < N; i++) B[i] = A[i-1] + A[i+1];

    • #pragma omp parallel for for (i = 1; i < N; i++) B[i] = x * A[i-1];

Identifying Non-parallelizable Loops

  • A loop is non-parallelizable if one iteration writes to a variable that another reads in subsequent iterations.

  • Example: #pragma omp parallel for for (i = 1; i < N; i++) B[i-1] = x * A[i] * B[i];

Analysis of Loop Parallelization

  • Some loops can be parallelized while others cannot due to dependencies; careful restructuring is needed to optimize performance and enable parallel execution.