Parallel Processing Final Exam Prep

0.0(0)

Studied by 3 people

0.0(0)

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/102

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

103 Terms

New cards

Serial programs

Programs written to run on a single processor.

New cards

Parallel programs

Programs written to take advantage of the presence of multiple processors.

New cards

Task-parallelism

Partitioning the various tasks carried out in solving a problem among the cores.

New cards

Data-parallelism

Partitioning the data used in solving a problem among the cores, where each core carries out similar operations on its part of the data.

New cards

Coordination

The interaction required between cores to solve a problem, involving communication, load balancing, and synchronization.

New cards

Communication

The process where one or more cores send their current data or partial results to another core.

New cards

Load balancing

Distributing work among cores so that each core has roughly the same amount of work to avoid idling.

New cards

Synchronization

Coordinating cores so they wait for each other at specific points (e.g., waiting for valid input data).

New cards

Shared-memory systems

Parallel systems where cores share access to the computer's memory.

New cards

Distributed-memory systems

Parallel systems where each core has its own private memory and cores communicate explicitly (e.g., messages).

New cards

Concurrent computing

A program in which multiple tasks can be in progress at any instant.

New cards

Parallel computing

A program in which multiple tasks cooperate closely to solve a problem.

New cards

Distributed computing

A program that may need to cooperate with other programs to solve a problem.

New cards

Von Neumann architecture

The classical computer architecture consisting of main memory, a CPU, and an interconnection between them.

New cards

Main memory

A collection of locations capable of storing both instructions and data.

New cards

Central Processing Unit (CPU)

The component responsible for executing instructions, divided into a control unit and a datapath.

New cards

Registers

Very fast storage locations inside the CPU.

New cards

Program counter

A register that stores the address of the next instruction to be executed.

New cards

Bus

A collection of parallel wires and hardware controlling access to them, used to connect CPU and memory.

New cards

Von Neumann bottleneck

The separation of memory and CPU that limits the rate at which instructions and data can be accessed.

New cards

Operating System (OS)

Software that manages hardware and software resources and controls the execution of programs.

New cards

Process

An instance of a computer program that is being executed.

New cards

Multitasking

The apparent simultaneous execution of multiple programs managed by the OS switching between them.

New cards

Thread

A "light-weight" process contained within a process that shares most resources but has its own stack and program counter.

New cards

Cache

A collection of memory locations that can be accessed in less time than main memory.

New cards

Locality

The principle that programs tend to access data/instructions physically close to recently used items.

New cards

Spatial locality

Accessing memory locations that are physically near previously accessed locations (e.g., arrays).

New cards

Temporal locality

Accessing the same memory location again in the near future.

New cards

Cache blocks (Cache lines)

Blocks of data/instructions transferred between main memory and cache.

New cards

Cache hit

When the CPU attempts to access data and it is found in the cache.

New cards

Cache miss

When the CPU attempts to access data and it is not found in the cache.

New cards

Write-through

A cache policy where data is written to main memory immediately when it is written to the cache.

New cards

Write-back

A cache policy where data is written to main memory only when the cache line is evicted (marked dirty).

New cards

Fully associative cache

A cache where a new line can be placed at any location.

New cards

Direct mapped cache

A cache where each cache line has a unique, specific location it must be assigned to.

New cards

n-way set associative cache

A cache where each cache line can be placed in one of n different locations.

New cards

Virtual memory

A technique allowing main memory to function as a cache for secondary storage (disk), using pages.

New cards

Page table

A table in memory that maps virtual addresses to physical addresses.

New cards

Translation-Lookaside Buffer (TLB)

A special fast cache for page table entries.

New cards

Page fault

An attempt to access a memory page that is not currently in main memory.

New cards

Instruction-Level Parallelism (ILP)

Techniques (like pipelining) allowing a single processor to execute multiple instructions simultaneously.

New cards

Pipelining

Arranging functional units in stages so different instructions can be processed simultaneously (like an assembly line).

New cards

Multiple issue

A processor design that issues and executes multiple instructions simultaneously.

New cards

Hardware multithreading

System support for rapid switching between threads to hide latency (e.g., waiting for memory).

New cards

Simultaneous Multithreading (SMT)

A variation of fine-grained multithreading where multiple threads use multiple functional units at once.

New cards

Flynn’s Taxonomy

A classification of computer architectures based on instruction and data streams (SISD, SIMD, MIMD).

New cards

SISD

Single Instruction, Single Data (standard serial von Neumann architecture).

New cards

SIMD

Single Instruction, Multiple Data (same instruction applied to multiple data items).

New cards

MIMD

Multiple Instruction, Multiple Data (autonomous processors executing independent streams).

New cards

Vector processors

Processors that can operate on arrays or vectors of data using special vector registers and instructions.

New cards

Graphics Processing Units (GPUs)

High-performance processors processing massive amounts of data using SIMD parallelism.

New cards

UMA (Uniform Memory Access)

A shared-memory system where access time to all memory locations is the same for all cores.

New cards

NUMA (Nonuniform Memory Access)

A shared-memory system where access time depends on the memory location relative to the core.

New cards

Interconnect

The hardware (wires, switches) connecting processors and memory.

New cards

Crossbar

A switched interconnect allowing simultaneous communication between different devices.

New cards

Bisection width

The minimum number of links needed to split the network nodes into two equal halves.

New cards

Bandwidth

The rate at which a link can transmit data.

New cards

Bisection bandwidth

The total bandwidth of the links crossing the bisection width.

New cards

Latency

The time that elapses between the source beginning to transmit and the destination starting to receive.

New cards

Cache coherence

The problem (and solutions) of ensuring multiple caches holding the same variable store the same value.

New cards

Snooping

A cache coherence protocol where cache controllers monitor the bus for updates to shared data.

New cards

Directory-based

A cache coherence protocol using a data structure (directory) to track the status of cache lines.

New cards

False sharing

When two threads access different variables that happen to be on the same cache line, forcing unnecessary memory transfers.

New cards

SPMD

Single Program, Multiple Data; a program structure where a single executable behaves differently based on process rank/ID.

New cards

Race condition

When multiple threads/processes access a shared resource, at least one is a write, and the outcome depends on timing.

New cards

Critical section

A block of code that updates a shared resource and can only be executed by one thread at a time.

New cards

Mutual exclusion

The requirement that only one thread executes a critical section at a time.

New cards

Mutex

A lock object used to enforce mutual exclusion.

New cards

Speedup

The ratio of serial run-time to parallel run-time (Tserial / Tparallel).

New cards

Efficiency

The speedup divided by the number of processes (Speedup / p); effectively utilization.

New cards

Amdahl’s law

A formula stating that maximum speedup is limited by the fraction of the program that is inherently serial.

New cards

Scalable

A program is scalable if efficiency can be maintained as the number of processors increases (usually by increasing problem size).

New cards

Strongly scalable

Maintaining efficiency as processors increase without increasing problem size.

New cards

Weakly scalable

Maintaining efficiency as processors increase by increasing problem size at the same rate.

New cards

MPI

Message-Passing Interface; a library of functions for C and Fortran to handle distributed memory programming.

New cards

Communicator

A collection of processes that can send messages to each other (e.g., MPICOMMWORLD).

New cards

Rank

A non-negative integer identifier for a process within a communicator.

New cards

Point-to-point communications

Communication that involves exactly two processes (e.g., MPISend and MPIRecv).

New cards

Collective communications

Communication functions that involve all the processes in a communicator.

New cards

MPI_Send

A function to send a message; can be blocking or buffering depending on implementation.

New cards

MPI_Recv

A blocking function used to receive a message.

New cards

Message matching

The rule that a receive matches a send if the communicator, tags, and destination/source ranks match.

New cards

Broadcast (MPI_Bcast)

A collective communication where data from a single process is sent to all processes.

New cards

Reduction (MPI_Reduce)

A collective communication where results from all processes are combined (e.g., summed) onto a destination process.

New cards

Scatter (MPI_Scatter)

A collective function that divides an array on one process into segments and distributes them to all processes.

New cards

Gather (MPI_Gather)

A collective function that collects segments of data from all processes and reassembles them on one process.

New cards

Allgather (MPI_Allgather)

A collective function that gathers data from all processes and distributes the complete set to all processes.

New cards

Block partition

Assigning contiguous blocks of data components to processes.

New cards

Cyclic partition

Assigning data components in a round-robin fashion to processes.

New cards

Block-cyclic partition

Assigning blocks of data components in a round-robin fashion.

New cards

Derived datatype

An MPI construct allowing the representation of arbitrary collections of data (types and locations) for transmission.

New cards

Pthreads

POSIX Threads; a standard API for thread programming on Unix-like systems.

New cards

Global variables

In Pthreads, variables declared outside functions that are accessible to all threads.

New cards

Local variables

Variables declared inside functions, private to the thread executing the function (on its stack).

New cards

Main thread

The initial thread started by the program execution (running main).

New cards

Busy-waiting

A synchronization method where a thread repeatedly tests a condition, consuming CPU cycles until it can proceed.

New cards

Spinlock

A lock mechanism that uses busy-waiting.

New cards

Semaphore

An unsigned integer synchronization primitive with semwait (decrement/block) and sempost (increment/signal) operations.

New cards

Producer-consumer synchronization

Synchronization where a consumer thread waits for a condition or data generated by a producer thread.

100

New cards

Barrier

A synchronization point where threads block until all threads have reached that point.