1/102
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
Serial programs
Programs written to run on a single processor.
Parallel programs
Programs written to take advantage of the presence of multiple processors.
Task-parallelism
Partitioning the various tasks carried out in solving a problem among the cores.
Data-parallelism
Partitioning the data used in solving a problem among the cores, where each core carries out similar operations on its part of the data.
Coordination
The interaction required between cores to solve a problem, involving communication, load balancing, and synchronization.
Communication
The process where one or more cores send their current data or partial results to another core.
Load balancing
Distributing work among cores so that each core has roughly the same amount of work to avoid idling.
Synchronization
Coordinating cores so they wait for each other at specific points (e.g., waiting for valid input data).
Shared-memory systems
Parallel systems where cores share access to the computer's memory.
Distributed-memory systems
Parallel systems where each core has its own private memory and cores communicate explicitly (e.g., messages).
Concurrent computing
A program in which multiple tasks can be in progress at any instant.
Parallel computing
A program in which multiple tasks cooperate closely to solve a problem.
Distributed computing
A program that may need to cooperate with other programs to solve a problem.
Von Neumann architecture
The classical computer architecture consisting of main memory, a CPU, and an interconnection between them.
Main memory
A collection of locations capable of storing both instructions and data.
Central Processing Unit (CPU)
The component responsible for executing instructions, divided into a control unit and a datapath.
Registers
Very fast storage locations inside the CPU.
Program counter
A register that stores the address of the next instruction to be executed.
Bus
A collection of parallel wires and hardware controlling access to them, used to connect CPU and memory.
Von Neumann bottleneck
The separation of memory and CPU that limits the rate at which instructions and data can be accessed.
Operating System (OS)
Software that manages hardware and software resources and controls the execution of programs.
Process
An instance of a computer program that is being executed.
Multitasking
The apparent simultaneous execution of multiple programs managed by the OS switching between them.
Thread
A "light-weight" process contained within a process that shares most resources but has its own stack and program counter.
Cache
A collection of memory locations that can be accessed in less time than main memory.
Locality
The principle that programs tend to access data/instructions physically close to recently used items.
Spatial locality
Accessing memory locations that are physically near previously accessed locations (e.g., arrays).
Temporal locality
Accessing the same memory location again in the near future.
Cache blocks (Cache lines)
Blocks of data/instructions transferred between main memory and cache.
Cache hit
When the CPU attempts to access data and it is found in the cache.
Cache miss
When the CPU attempts to access data and it is not found in the cache.
Write-through
A cache policy where data is written to main memory immediately when it is written to the cache.
Write-back
A cache policy where data is written to main memory only when the cache line is evicted (marked dirty).
Fully associative cache
A cache where a new line can be placed at any location.
Direct mapped cache
A cache where each cache line has a unique, specific location it must be assigned to.
n-way set associative cache
A cache where each cache line can be placed in one of n different locations.
Virtual memory
A technique allowing main memory to function as a cache for secondary storage (disk), using pages.
Page table
A table in memory that maps virtual addresses to physical addresses.
Translation-Lookaside Buffer (TLB)
A special fast cache for page table entries.
Page fault
An attempt to access a memory page that is not currently in main memory.
Instruction-Level Parallelism (ILP)
Techniques (like pipelining) allowing a single processor to execute multiple instructions simultaneously.
Pipelining
Arranging functional units in stages so different instructions can be processed simultaneously (like an assembly line).
Multiple issue
A processor design that issues and executes multiple instructions simultaneously.
Hardware multithreading
System support for rapid switching between threads to hide latency (e.g., waiting for memory).
Simultaneous Multithreading (SMT)
A variation of fine-grained multithreading where multiple threads use multiple functional units at once.
Flynn’s Taxonomy
A classification of computer architectures based on instruction and data streams (SISD, SIMD, MIMD).
SISD
Single Instruction, Single Data (standard serial von Neumann architecture).
SIMD
Single Instruction, Multiple Data (same instruction applied to multiple data items).
MIMD
Multiple Instruction, Multiple Data (autonomous processors executing independent streams).
Vector processors
Processors that can operate on arrays or vectors of data using special vector registers and instructions.
Graphics Processing Units (GPUs)
High-performance processors processing massive amounts of data using SIMD parallelism.
UMA (Uniform Memory Access)
A shared-memory system where access time to all memory locations is the same for all cores.
NUMA (Nonuniform Memory Access)
A shared-memory system where access time depends on the memory location relative to the core.
Interconnect
The hardware (wires, switches) connecting processors and memory.
Crossbar
A switched interconnect allowing simultaneous communication between different devices.
Bisection width
The minimum number of links needed to split the network nodes into two equal halves.
Bandwidth
The rate at which a link can transmit data.
Bisection bandwidth
The total bandwidth of the links crossing the bisection width.
Latency
The time that elapses between the source beginning to transmit and the destination starting to receive.
Cache coherence
The problem (and solutions) of ensuring multiple caches holding the same variable store the same value.
Snooping
A cache coherence protocol where cache controllers monitor the bus for updates to shared data.
Directory-based
A cache coherence protocol using a data structure (directory) to track the status of cache lines.
False sharing
When two threads access different variables that happen to be on the same cache line, forcing unnecessary memory transfers.
SPMD
Single Program, Multiple Data; a program structure where a single executable behaves differently based on process rank/ID.
Race condition
When multiple threads/processes access a shared resource, at least one is a write, and the outcome depends on timing.
Critical section
A block of code that updates a shared resource and can only be executed by one thread at a time.
Mutual exclusion
The requirement that only one thread executes a critical section at a time.
Mutex
A lock object used to enforce mutual exclusion.
Speedup
The ratio of serial run-time to parallel run-time (Tserial / Tparallel).
Efficiency
The speedup divided by the number of processes (Speedup / p); effectively utilization.
Amdahl’s law
A formula stating that maximum speedup is limited by the fraction of the program that is inherently serial.
Scalable
A program is scalable if efficiency can be maintained as the number of processors increases (usually by increasing problem size).
Strongly scalable
Maintaining efficiency as processors increase without increasing problem size.
Weakly scalable
Maintaining efficiency as processors increase by increasing problem size at the same rate.
MPI
Message-Passing Interface; a library of functions for C and Fortran to handle distributed memory programming.
Communicator
A collection of processes that can send messages to each other (e.g., MPICOMMWORLD).
Rank
A non-negative integer identifier for a process within a communicator.
Point-to-point communications
Communication that involves exactly two processes (e.g., MPISend and MPIRecv).
Collective communications
Communication functions that involve all the processes in a communicator.
MPI_Send
A function to send a message; can be blocking or buffering depending on implementation.
MPI_Recv
A blocking function used to receive a message.
Message matching
The rule that a receive matches a send if the communicator, tags, and destination/source ranks match.
Broadcast (MPI_Bcast)
A collective communication where data from a single process is sent to all processes.
Reduction (MPI_Reduce)
A collective communication where results from all processes are combined (e.g., summed) onto a destination process.
Scatter (MPI_Scatter)
A collective function that divides an array on one process into segments and distributes them to all processes.
Gather (MPI_Gather)
A collective function that collects segments of data from all processes and reassembles them on one process.
Allgather (MPI_Allgather)
A collective function that gathers data from all processes and distributes the complete set to all processes.
Block partition
Assigning contiguous blocks of data components to processes.
Cyclic partition
Assigning data components in a round-robin fashion to processes.
Block-cyclic partition
Assigning blocks of data components in a round-robin fashion.
Derived datatype
An MPI construct allowing the representation of arbitrary collections of data (types and locations) for transmission.
Pthreads
POSIX Threads; a standard API for thread programming on Unix-like systems.
Global variables
In Pthreads, variables declared outside functions that are accessible to all threads.
Local variables
Variables declared inside functions, private to the thread executing the function (on its stack).
Main thread
The initial thread started by the program execution (running main).
Busy-waiting
A synchronization method where a thread repeatedly tests a condition, consuming CPU cycles until it can proceed.
Spinlock
A lock mechanism that uses busy-waiting.
Semaphore
An unsigned integer synchronization primitive with semwait (decrement/block) and sempost (increment/signal) operations.
Producer-consumer synchronization
Synchronization where a consumer thread waits for a condition or data generated by a producer thread.
Barrier
A synchronization point where threads block until all threads have reached that point.