Parallel Computing Week 5

0.0(0)

Studied by 2 people

0.0(0)

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/61

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

62 Terms

New cards

List 4 examples of causes behind slow performance:

Overhead
Non Parellelizable computational tasks
Idle processor
Contention for Resources

New cards

What are 4 things you need to think of when considering overhead

Communication
Synchronization
Computation
Memory

New cards

True or False, sequential programs communicate, thread wise

False, sequential programs have no communication; single data transitions are overhead for parallel code

New cards

True or false, caching between threads and shared counters are good examples of communication?

True

New cards

the impact/cost of communication overhead comes down to the BLANK

comes down to the details of the hardware

New cards

Synchronization overhead arises when?

Synchronization overhead arises when a thread or process is waiting for another thread to complete

New cards

most BLANK algorithms will perform more computations than a BLANK version of the program

most parallel algorithms will perform more computations than a sequential version of the program

New cards

True or False, most of the times, memory overhead DOES hurt performance

False, most of the times it DOES NOT hurt performance, but memory architecture and hardware CAN play a factor

New cards

what is contention, in terms of a parallel implemented system?

the degradation of system performance due to competition for shared resources

New cards

True or False, I/O devices create contention for resources, and can degrade performance

True

New cards

list 3 examples of why a process or thread might not be able to proceed with a task? (reasons behind idle time)

Lack of work to do
Waiting for an external resource
Waiting for another thread to complete

New cards

Idle time is often a consequence of BLANK and BLANK overheads

it is often a consequence of synchronization and communication overheads

New cards

What are the two classifications of idle times (taxonomic names)

Load Imbalance
Memory-Bound Computations

New cards

Load imbalance refers to?

Uneven distribution of work to the processors available

New cards

What is an extreme example of a load imbalance, and why?

Sequential processing, because multiple processors are idle, while one is handling all of the necessary calculations

New cards

True or False, tree based parallel summation, where one branch/thread waits on many children branches to finish before summing the total, is NOT considered a Load Imbalance.

False, this is most definitely an imbalance because on thread must wait for others to complete, and sits idle while doing so.

New cards

What does the classification of memory bound computations define?

Threads/Processes can stall if waiting for memory operations, especially if trying to read or write from/to a locked mutex

New cards

True or False, Memory hardware architecture does not matter at all when considering memory bound computations.

False, bus architecture of the chip can affect bandwidth, and read/write speeds can affect latency

New cards

Weak scaling defines increasing the BLANK of a problem to tackle in a BLANK

Weak scaling defines increasing the size of the problem to tackle in a specific amount of time

New cards

The weak scaling calculation is BLANK processors with a BLANK amount of data/tasks per processor

The weak scaling calculation is N processors with a fixed amount of data/tasks per processor

New cards

Weak scaling accomplishes BLANK work in BLANK amount of time

Weak scaling accomplishes more work in the same amount of time

New cards

what is an example of weak scaling (arbitrary question more to refresh memory)?

Imagine it takes a single processor 20 minutes to make and decorate 6 cookies, if you add a second processor, you can make double the cookies (12 cookies) in 20 minutes

New cards

What is Strong Scaling?

Accomplishing a given task faster

New cards

Strong Scaling involves BLANK processors with a BLANK size

Strong Scaling involves N processors with a fixed total problem size

New cards

What is an example of a Strong Scaling solution (think of cookie decoration)

If one processor takes 20 minutes to make and decorate 6 cookies, Imagine implementing parallel in such a way that 2 processors make 6 cookies in 10 minutes (not always half the time)

New cards

True or False, Weak vs Strong scaling states that Weak Scaling will do a set amount of work in less time, whereas Strong Scaling will do more work in a set amount of time

False,

Weak Scaling will do more work in a set amount of time
Strong Scaling will do a set amount of work in less time

New cards

Throughput is the measurement of BLANK per BLANK

Throughput is the measurement of actions per unit time

New cards

Latency is measured in BLANK and involves the BLANK to perform the task once

Latency is measure in units of time and involves the time it takes to perform the task once

New cards

The calculation of latency is?

Latency = time/task

New cards

Speedup is a metric used to measure the BLANK of a program

Speedup is a metric to measure the effectiveness of a program

New cards

The calculation for program speedup is?

Speedup = sequential time/parallel time

New cards

What is Amdahl’s law?

Amdahl’s law is the mathematical equation that can be used to estimate the potential speedup for processing

New cards

(fun tidbit) when was Amdah'l’s law developed?

1967

New cards

Amdahl’s law’s equation is?

S_latency (s) = 1/(1-P) + (P/S)

New cards

In Amdahl’s law, the S stands for BLANK, and the P stands for BLANK

The S stands for Speedup of the parallelized portion, while the P stands for the portion of a program that is parallelizable

New cards

True or False, having access to the equation for speedup means we CAN run parallel code on a single processor

False, this does not mean we can run the parallel code on a single processor. (if confused, slide 35 of week 5). There is overhead, contention, and idle time to consider

New cards

Efficiency represents what?

Efficiency represents how well the resources are being utilized in a system

New cards

The calculation for efficiency is?

Efficiency = Speedup/(# of processors used)

New cards

True or False, having an increase in processors used in a program ALWAYS increases efficiency

False, especially considering that the efficiency calculation has to include the number of processors used as a denominator in efficiency calculation

New cards

Fill in the blanks for performance trade off examples:

Communication vs BLANK
BLANK vs Parallelism
Overhead vs BLANK

Communication vs Computation
Memory vs Parallelism
Overhead vs Paralellism

New cards

Communication costs can be reduced by using which 2 methods?

Overlapping communication and computation
Redundant computation

New cards

True or False, communication solutions/approaches cause more computation at less of a cost than the communications in a program

True

New cards

Overlapping communication and computations involves what?

looking for computation that is independent of comms
executing independent computations sequentially

New cards

Redundant computation involves what?

evaluating data/values being transmitted between threads or processes
evaluating the cost of these computations
if there is a low cost of computation, threads should re calculate values on their own, rather than communicate with one another

New cards

True or False, parallel processing can often be increased when memory usage is decreased

False, parallel processing can often be increased when memory usage is increased

New cards

What are the two concepts of Memory vs Parallel solutions?

Privatization (using additional mem to break false dependencies)
Padding (allocating extra mem to force variables to reside in their own cache lines)

New cards

True or False, A processor can access its L1 cache much faster than system memory on via/on a bus

True, think about the different chip architectures

New cards

In terms of Overhead vs Parallelism, as the number of threads increases, so too does the BLANK and the BLANK

so too does the overhead and lock contention

New cards

What are 3 things to be considered, when thinking about Overhead vs Parallelism?

Parallelize Overhead
Load Balance vs. Overhead
Granularity

New cards

Paralellizing Overhead approach looks at your BLANK?

The Paralleliz Overhead approach looks at your critical sections of code

New cards

What is a question to ask when considering Parallelizing Overhead?

Is there a single shared memory being used for final results

New cards

A solution to a single shared memory would be the implementation of a BLANK style approach?

A Tree style approach, where each branch has its own shared variable that gets passed up to the parent branch (tree trunk if you will), enabling summation of branch calculations at the highest level while lower level threads can still crunch numbers (i.e. no waiting on any level)

New cards

Load balancing is when we ask what?

are we using all of the processors/cores to the maximum efficiency, or is stuff idle?

New cards

True or False, it is easier to distribute smaller numbers of coarse grained code sections than it is to distribute work for a large number of fine grained code sections?

False,

It is easy to distribute a large number of fine grained units evenly
It is harder to distribute a smaller number of coarser units evenly

New cards

Load Balance vs Overhead involves evaluating what?

Evaluating how you segment your processing tasks/data

New cards

Contention between Load Balance and Overhead happens when work is BLANK or BLANK

Happens when the amount of work is Irregular or Dynamically variable

New cards

Many trade offs discussed in week 5 are related to the BLANK of paralellism

Many of these trade offs are related to the granularity of paralellism

New cards

True or False, one way to increase the number of dependencies is to increase the granularity

False, one way to decrease (not increase) the number of dependencies is to increase the granularity

New cards

What is Batching, when discussing granularity?

Batching is a process in which work can be performed as a group, such as placing threads in a group of shared resources.

New cards

Batching increases the BLANK of interactions, but reduces the BLANK of interactions, reducing BLANK overhead

Batching increases the granularity of interactions, but reduces the frequency of interactions, reducing Communications Overhead

New cards

True or False, there is NO limit to granularity

False

New cards

Granularity limits depend on which 2 things?

Algorithmic Characteristics, and Hardware Characteristics.