Operating Systems Lecture on Parallel and Distributed Systems

CSCI U511 - Operating Systems Lecture Notes

Continued discussion on parallel and distributed systems
Key topics:
- Parallel and Distributed Systems
- Importance of Parallel and Distributed Systems
- Multiprogramming and Parallel Computing
- Parallel Algorithm Design
- Performance Metrics
Important dates
- Homework 6: Due April 1 on Blackboard
- Quiz 6: April 3
- Exam 2: April 8 (Sample to be posted soon)
- SE Reflection Report: Due March 28 at midnight
Project updates will be discussed today.

Defined as:
- Serial execution time: TS
- Parallel execution time: TP
- Formula:
  - TP = Tcomp + Tmem + Tcomm + Tidle + Textra
Elapsed time is measured from the start of computation to the finish of the last processing element.
Total parallel overhead calculated as:
- Total Parallel Overhead = pTP - TS

Calculated as:
- Tcomm = α + β · size
  - α: per message cost
  - β: per byte cost
Important to optimize communication:
- Strategies include sending fewer messages and combining messages to reduce overhead.

Defined as:
- Speedup = TS / TP
  - Note: TS is the execution time of the best sequential algorithm.
Super-linear speedup occurs when speedup > p, taking advantage of faster storage, e.g., cache/memory.

Represents the fraction of time for which a processing element is effectively utilized.
Calculated as:
- E = Speedup / p
  - E indicates the ratio between actual and ideal performance:
  - E = Actual Performance / Ideal Performance

Importance of adapting parallel programs to growth in:
- Problem size
- Number of processing elements
Scalability: The ability to maintain speedup as we increase the number of processing elements.

Problem Size:
- Refers to the amount of computation necessary to solve the problem and is related to serial execution time, not merely data size.
Scalable Parallel Systems:
- Capable of maintaining efficiency when increasing both problem size and the number of processing elements.

The task is to compute C = A * B where each is SZ by SZ or n by n.
Basic algorithms require SZ^3 or n^3 operations.
Key variables influencing performance:
- Data layout
- Topology of machines
- Scheduling communications

SUMMA: Scalable Universal Matrix Multiply Algorithm
Designed to compute C = A · B with each matrix being n by n.
Sequential calculations require n^3 multiplications.
Algorithm using q^2 processors, where each processor calculates a block of size (m by m) with q iterations:
- q groups of broadcasts in parallel for matrix A and matrix B, followed by an m by m multiplication.

Sequential execution time (Ts):
- Comprises only computation time, ignoring other factors.
- Ts = n^3 if a multiplication takes one unit time.
Parallel execution time (Tp):
- The time taken is distributed across q iterations and includes communication cost:
- Ignoring memory time and assuming balanced workload:
- TP = n^3 * q^2 + q * 2 * n^2
Analysis of Speedup and Efficiency to demonstrate scalability with respect to n and q.