Parallel and Sequential Computing
One way to speed up some types of algorithms is to use parallel computing to spread the algorithm across multiple processors running simultaneously.
The standard programming model is sequential computing: the computer executes each operation of the program in order, one at a time.
Consider this program that process images and reports how many are cats:
images ← [ "pet1.jpg", "pet2.jpg", "pet3.jpg", "pet4.jpg"]
numCats ← 0
FOR EACH image IN images
{
foundCat ← detectCat(image)
IF foundCat
{
numCats ← numCats + 1
}
}
The program begins with a list of filenames and a variable to store the number of images of cats, initialized to 0. A loop iterates through the list, repeatedly calling a function that detects whether the image has a cat and updating the variable.
Here's a visualization of the program executing on 4 images:
Now let's analyze the program further, by annotating each operation with how many seconds it takes and how many times it's called:
Time | Calls | Operation |
---|---|---|
3 | 1 |
|
1 | 1 |
|
1 | 4 |
|
10 | 4 |
|
1 | 4 |
|
2 | 4 |
|
Note: the values in the time column are made up, the actual time would vary based on the implementation of this pseudocode and the hardware running it.
We can calculate the total time taken by multiplying the number of seconds for each operation by the number of calls and summing the results. For this program, that'd be:
This timeline visualizes the time taken by the computer:
A timeline of program execution that starts at 0 seconds and ends at 60 seconds. There are markers where each loop iteration begins- at 4 seconds, 18 seconds, 32 seconds, and 46 seconds.
The sequential model assumes that only one operation can be executed at a time, and that is true of a single computer with a single processor. However, most modern computers have multi-core processors, where each core can independently execute an operation.
Multi-core processors can take advantage of parallel computing, a computational model that breaks programs into smaller sequential operations and performs those smaller operations in parallel.
Can we modify the cat detection program so that some of its operations can be executed in parallel? Only if there are operations that are not dependent on each other.
🤔 Take a look at the program again and see if you can identify operations that don't need to happen sequentially.
For this program, we could parallelize the operations inside the loop, since the detectCat
function does not depend on the results from other calls to detectCat
. That means telling the computer that these lines of code can be executed in parallel:
foundCat ← detectCat(image)
IF foundCat
{
numCats ← numCats + 1
}
The exact mechanisms for parallelizing a program depend on the particular programming language and computing environment, which we won't dive into here.
Here's a visualization of the program executing those operations in parallel:
Now how long will the program take to execute? Remember the analysis from before:
Time | Calls | Operation |
---|---|---|
3 | 1 |
|
1 | 1 |
|
1 | 4 |
|
10 | 4 |
|
1 | 4 |
|
2 | 4 |
|
The non-parallelized operations take the same amount of time as before:
The time taken by the parallelized operations depends on the number of parallel processors executing operations at once. If I run this program on a computer with 4 cores, then each core can process one of the 4 images, so the parallel portion will only take as long as a single images takes:
This timeline visualizes the time taken by a computer that splits the program across 4 processes:
A timeline of program execution that starts at 0 seconds and ends at 30 seconds. A marker at 4 seconds indicates when the parallel processes start detecting cats.
One way to evaluate the benefits of parallelizing a program is to measure the speedup: the ratio of the time taken to run the program sequentially to the time taken to run the parallelized program.
Our example cat detection program takes seconds to run sequentially. When run in parallel on four processors, with each image requiring seconds, the program takes seconds to run. We calculate the speedup by dividing by :
We do not achieve a speedup of exactly , despite using that many processors. The initial sequential portion still takes the same amount of time even if we have infinitely many processors. So, this initial sequential portion limits the maximum speedup of parallel computing.
Parallelized programs typically run on input sizes in the thousands or millions. If we ran this program on thousands of images, the initial sequential part would take a tiny fraction of time compared to the parallelized part.
CHECK YOUR UNDERSTANDING
When run in parallel on two processors, with each image requiring seconds, the program takes seconds. What is the speedup?
CheckExplain
The operations inside the parallelized program segments may not always take the same amount of time, especially if they process program input.
For example, the detectCat
function in our example may be using an algorithm whose time varies based on the image size or complexity. The operation may only take seconds on one image but seconds on another.
This timeline visualizes that situation:
A timeline of program execution that starts at 0 seconds and ends at 30 seconds. A marker at 4 seconds indicates when the parallel processes start detecting cats. Most processes end at 18 seconds, but the third process ends at 30 seconds.
The total time taken has now increased to 30 seconds, as the parallel portion takes as long as the longest of the parallelized operations.
CHECK YOUR UNDERSTANDING
The original fully sequential program took 60 seconds. If the parallelized program takes 30 seconds, what is the speedup?
CheckExplain
To compensate for the fact that there's often variation in execution times, the program can be clever about how it sends work to processors. Instead of pre-allocating tasks to processes, it can just send the next task to whichever processor is ready and waiting.
For example, imagine we ran the cat detection program on seven images and the detectCat()
operation took a long time on the third image. The program doesn't need to wait on that to complete before it analyzes the seventh image; it can send that image to a free processor instead:
A timeline of program execution that starts at 0 seconds and ends at 30 seconds. A marker at 4 seconds indicates when the parallel processes start detecting cats. Three processes end at 18 seconds and start again on another image until 32 seconds. The third process ends at 30 seconds.
CHECK YOUR UNDERSTANDING
When run on seven images, the sequential program takes 102 seconds. If the parallelized program takes 32 seconds, what is the speedup?
(You can enter the speedup as a fraction)
CheckExplain
🙋🏽🙋🏻♀️🙋🏿♂️Do you have any questions about this topic? We'd love to answer—just ask in the questions area below!
One way to speed up some types of algorithms is to use parallel computing to spread the algorithm across multiple processors running simultaneously.
The standard programming model is sequential computing: the computer executes each operation of the program in order, one at a time.
Consider this program that process images and reports how many are cats:
images ← [ "pet1.jpg", "pet2.jpg", "pet3.jpg", "pet4.jpg"]
numCats ← 0
FOR EACH image IN images
{
foundCat ← detectCat(image)
IF foundCat
{
numCats ← numCats + 1
}
}
The program begins with a list of filenames and a variable to store the number of images of cats, initialized to 0. A loop iterates through the list, repeatedly calling a function that detects whether the image has a cat and updating the variable.
Here's a visualization of the program executing on 4 images:
Now let's analyze the program further, by annotating each operation with how many seconds it takes and how many times it's called:
Time | Calls | Operation |
---|---|---|
3 | 1 |
|
1 | 1 |
|
1 | 4 |
|
10 | 4 |
|
1 | 4 |
|
2 | 4 |
|
Note: the values in the time column are made up, the actual time would vary based on the implementation of this pseudocode and the hardware running it.
We can calculate the total time taken by multiplying the number of seconds for each operation by the number of calls and summing the results. For this program, that'd be:
This timeline visualizes the time taken by the computer:
A timeline of program execution that starts at 0 seconds and ends at 60 seconds. There are markers where each loop iteration begins- at 4 seconds, 18 seconds, 32 seconds, and 46 seconds.
The sequential model assumes that only one operation can be executed at a time, and that is true of a single computer with a single processor. However, most modern computers have multi-core processors, where each core can independently execute an operation.
Multi-core processors can take advantage of parallel computing, a computational model that breaks programs into smaller sequential operations and performs those smaller operations in parallel.
Can we modify the cat detection program so that some of its operations can be executed in parallel? Only if there are operations that are not dependent on each other.
🤔 Take a look at the program again and see if you can identify operations that don't need to happen sequentially.
For this program, we could parallelize the operations inside the loop, since the detectCat
function does not depend on the results from other calls to detectCat
. That means telling the computer that these lines of code can be executed in parallel:
foundCat ← detectCat(image)
IF foundCat
{
numCats ← numCats + 1
}
The exact mechanisms for parallelizing a program depend on the particular programming language and computing environment, which we won't dive into here.
Here's a visualization of the program executing those operations in parallel:
Now how long will the program take to execute? Remember the analysis from before:
Time | Calls | Operation |
---|---|---|
3 | 1 |
|
1 | 1 |
|
1 | 4 |
|
10 | 4 |
|
1 | 4 |
|
2 | 4 |
|
The non-parallelized operations take the same amount of time as before:
The time taken by the parallelized operations depends on the number of parallel processors executing operations at once. If I run this program on a computer with 4 cores, then each core can process one of the 4 images, so the parallel portion will only take as long as a single images takes:
This timeline visualizes the time taken by a computer that splits the program across 4 processes:
A timeline of program execution that starts at 0 seconds and ends at 30 seconds. A marker at 4 seconds indicates when the parallel processes start detecting cats.
One way to evaluate the benefits of parallelizing a program is to measure the speedup: the ratio of the time taken to run the program sequentially to the time taken to run the parallelized program.
Our example cat detection program takes seconds to run sequentially. When run in parallel on four processors, with each image requiring seconds, the program takes seconds to run. We calculate the speedup by dividing by :
We do not achieve a speedup of exactly , despite using that many processors. The initial sequential portion still takes the same amount of time even if we have infinitely many processors. So, this initial sequential portion limits the maximum speedup of parallel computing.
Parallelized programs typically run on input sizes in the thousands or millions. If we ran this program on thousands of images, the initial sequential part would take a tiny fraction of time compared to the parallelized part.
CHECK YOUR UNDERSTANDING
When run in parallel on two processors, with each image requiring seconds, the program takes seconds. What is the speedup?
CheckExplain
The operations inside the parallelized program segments may not always take the same amount of time, especially if they process program input.
For example, the detectCat
function in our example may be using an algorithm whose time varies based on the image size or complexity. The operation may only take seconds on one image but seconds on another.
This timeline visualizes that situation:
A timeline of program execution that starts at 0 seconds and ends at 30 seconds. A marker at 4 seconds indicates when the parallel processes start detecting cats. Most processes end at 18 seconds, but the third process ends at 30 seconds.
The total time taken has now increased to 30 seconds, as the parallel portion takes as long as the longest of the parallelized operations.
CHECK YOUR UNDERSTANDING
The original fully sequential program took 60 seconds. If the parallelized program takes 30 seconds, what is the speedup?
CheckExplain
To compensate for the fact that there's often variation in execution times, the program can be clever about how it sends work to processors. Instead of pre-allocating tasks to processes, it can just send the next task to whichever processor is ready and waiting.
For example, imagine we ran the cat detection program on seven images and the detectCat()
operation took a long time on the third image. The program doesn't need to wait on that to complete before it analyzes the seventh image; it can send that image to a free processor instead:
A timeline of program execution that starts at 0 seconds and ends at 30 seconds. A marker at 4 seconds indicates when the parallel processes start detecting cats. Three processes end at 18 seconds and start again on another image until 32 seconds. The third process ends at 30 seconds.
CHECK YOUR UNDERSTANDING
When run on seven images, the sequential program takes 102 seconds. If the parallelized program takes 32 seconds, what is the speedup?
(You can enter the speedup as a fraction)
CheckExplain
🙋🏽🙋🏻♀️🙋🏿♂️Do you have any questions about this topic? We'd love to answer—just ask in the questions area below!