Parallel and Sequential Computing

One way to speed up some types of algorithms is to use parallel computing to spread the algorithm across multiple processors running simultaneously.

Sequential computing

The standard programming model is sequential computing: the computer executes each operation of the program in order, one at a time.

Consider this program that process images and reports how many are cats:

``images ← [ "pet1.jpg", "pet2.jpg", "pet3.jpg", "pet4.jpg"]numCats ← 0FOR EACH image IN images{    foundCat ← detectCat(image)    IF foundCat    {      numCats ← numCats + 1    }}``

The program begins with a list of filenames and a variable to store the number of images of cats, initialized to 0. A loop iterates through the list, repeatedly calling a function that detects whether the image has a cat and updating the variable.

Here's a visualization of the program executing on 4 images:

Now let's analyze the program further, by annotating each operation with how many seconds it takes and how many times it's called:

Time

Calls

Operation

3

1

`images ← [ "pet1.jpg", "pet2.jpg", "pet3.jpg", "pet4.jpg"]`

1

1

`numCats ← 0`

1

4

`FOR EACH image IN images`

10

4

`foundCat ← detectCat(image)`

1

4

`IF foundCat`

2

4

`numCats ← numCats + 1`

Note: the values in the time column are made up, the actual time would vary based on the implementation of this pseudocode and the hardware running it.

We can calculate the total time taken by multiplying the number of seconds for each operation by the number of calls and summing the results. For this program, that'd be:

This timeline visualizes the time taken by the computer:

A timeline of program execution that starts at 0 seconds and ends at 60 seconds. There are markers where each loop iteration begins- at 4 seconds, 18 seconds, 32 seconds, and 46 seconds.

Parallel computing

The sequential model assumes that only one operation can be executed at a time, and that is true of a single computer with a single processor. However, most modern computers have multi-core processors, where each core can independently execute an operation.

Multi-core processors can take advantage of parallel computing, a computational model that breaks programs into smaller sequential operations and performs those smaller operations in parallel.

Can we modify the cat detection program so that some of its operations can be executed in parallel? Only if there are operations that are not dependent on each other.

🤔 Take a look at the program again and see if you can identify operations that don't need to happen sequentially.

For this program, we could parallelize the operations inside the loop, since the `detectCat` function does not depend on the results from other calls to `detectCat`. That means telling the computer that these lines of code can be executed in parallel:

``    foundCat ← detectCat(image)    IF foundCat    {      numCats ← numCats + 1    }``

The exact mechanisms for parallelizing a program depend on the particular programming language and computing environment, which we won't dive into here.

Here's a visualization of the program executing those operations in parallel:

Now how long will the program take to execute? Remember the analysis from before:

Time

Calls

Operation

3

1

`images ← [ "pet1.jpg", "pet2.jpg", "pet3.jpg", "pet4.jpg"]`

1

1

`numCats ← 0`

1

4

`FOR EACH image IN images`

10

4

`foundCat ← detectCat(image)`

1

4

`IF foundCat`

2

4

`numCats ← numCats + 1`

The non-parallelized operations take the same amount of time as before:

The time taken by the parallelized operations depends on the number of parallel processors executing operations at once. If I run this program on a computer with 4 cores, then each core can process one of the 4 images, so the parallel portion will only take as long as a single images takes:

This timeline visualizes the time taken by a computer that splits the program across 4 processes:

A timeline of program execution that starts at 0 seconds and ends at 30 seconds. A marker at 4 seconds indicates when the parallel processes start detecting cats.

Comparing sequential vs. parallel execution

One way to evaluate the benefits of parallelizing a program is to measure the speedup: the ratio of the time taken to run the program sequentially to the time taken to run the parallelized program.

Our example cat detection program takes   seconds to run sequentially. When run in parallel on four processors, with each image requiring   seconds, the program takes   seconds to run. We calculate the speedup by dividing   by  :

We do not achieve a speedup of exactly  , despite using that many processors. The initial sequential portion still takes the same amount of time even if we have infinitely many processors. So, this initial sequential portion limits the maximum speedup of parallel computing.

Parallelized programs typically run on input sizes in the thousands or millions. If we ran this program on thousands of images, the initial sequential part would take a tiny fraction of time compared to the parallelized part.

When run in parallel on two processors, with each image requiring   seconds, the program takes   seconds. What is the speedup?

CheckExplain

Variable execution times

The operations inside the parallelized program segments may not always take the same amount of time, especially if they process program input.

For example, the `detectCat` function in our example may be using an algorithm whose time varies based on the image size or complexity. The operation may only take   seconds on one image but   seconds on another.

This timeline visualizes that situation:

A timeline of program execution that starts at 0 seconds and ends at 30 seconds. A marker at 4 seconds indicates when the parallel processes start detecting cats. Most processes end at 18 seconds, but the third process ends at 30 seconds.

The total time taken has now increased to 30 seconds, as the parallel portion takes as long as the longest of the parallelized operations.

The original fully sequential program took 60 seconds. If the parallelized program takes 30 seconds, what is the speedup?

CheckExplain

To compensate for the fact that there's often variation in execution times, the program can be clever about how it sends work to processors. Instead of pre-allocating tasks to processes, it can just send the next task to whichever processor is ready and waiting.

For example, imagine we ran the cat detection program on seven images and the `detectCat()` operation took a long time on the third image. The program doesn't need to wait on that to complete before it analyzes the seventh image; it can send that image to a free processor instead:

A timeline of program execution that starts at 0 seconds and ends at 30 seconds. A marker at 4 seconds indicates when the parallel processes start detecting cats. Three processes end at 18 seconds and start again on another image until 32 seconds. The third process ends at 30 seconds.