CS 2050 HPC Final Exam Vocabulary Flashcards

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/33

Earn XP

Description and Tags

A comprehensive set of vocabulary cards covering High Performance Computing (HPC) fundamentals, hardware, parallel programming models, and scaling concepts.

Last updated 10:31 PM on 5/12/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

34 Terms

New cards

Moore's Law

A historical trend of exponential growth in transistor density where integrated circuit resources doubled roughly every $18-24$ months.

New cards

Dennard scaling

A principle stating that scaling voltage with transistor size keeps electric fields constant and preserves device behavior; it ceased to function once voltage scaling slowed.

New cards

Pollack's Rule

The observation that processor performance increases roughly with the square root of transistor count or area.

New cards

Flynn's taxonomy

A classification system for computer architectures based on instruction and data streams, including SISD, SIMD, MISD, and MIMD.

New cards

SIMD

Single Instruction, Multiple Data; a form of data-level parallelism involving lockstep execution across vector lanes.

New cards

Slurm

A cluster management tool used to run intensive tasks on login nodes; commands include sinfo, sbatch, squeue, scancel, and srun.

New cards

DRAM

Dynamic Random Access Memory; the type of memory used for the main memory of a typical computing node.

New cards

SRAM

Static Random Access Memory; the type of memory used for high-speed caches.

New cards

Double precision

A numerical format using $64$ bits, equivalent to $8$ bytes per value.

New cards

L1 Cache

The cache closest to the ALU, typically around $30\,KB$ in size with a latency of roughly $4$ cycles.

New cards

L3 Cache

A larger cache level, typically around $30\,MB$ in size with a latency of roughly $50\,cycles$ .

New cards

Latency

The response time measured from the moment a data request is made until the data arrives.

New cards

Bandwidth

The rate at which data is transferred or requests are satisfied; also known as throughput.

New cards

AVX512

An instruction set with $512$ -bit vector registers, capable of holding $8$ double-precision or $16$ single-precision values.

New cards

Spatial locality

A memory access pattern where nearby memory addresses are used frequently.

New cards

Temporal locality

A memory access pattern where the same data is reused within a short period.

New cards

Race condition

A situation where the outcome of a program depends on the specific timing or interleaving of thread execution.

New cards

Atomic operation

An operation that appears indivisible to other threads and cannot be interrupted, preventing exposure of intermediate states.

New cards

MPI Rank

A unique identifier assigned to each process within an MPI communicator.

New cards

MPI_COMM_WORLD

The default MPI communicator containing all processes in the current run.

New cards

MPI_Barrier

A collective operation that forces all ranks in a communicator to wait until every rank has reached the barrier.

New cards

Amdahl's Law

A formula used to predict speedup based on the serial fraction $f$ : $S_p = \frac{1}{f + \frac{1-f}{p}}$ .

New cards

Strong scaling

A measure of parallel efficiency where problem size is fixed and processors are increased to reduce execution time.

New cards

Weak scaling

A measure of parallel efficiency where problem size is increased proportionally to the number of processors to maintain fixed work per processor.

New cards

NVIDIA warp

A fixed group of $32$ threads that execute together on a GPU.

New cards

global

A CUDA C++ qualifier for a GPU kernel that can be called from the host and returns void.

New cards

device

A CUDA C++ qualifier for functions that execute on the GPU and can only be called from other GPU functions.

New cards

Unified Memory

A managed memory system (e.g., cudaMallocManaged) that is accessible from both CPU and GPU via implicit page migration.

New cards

GEMM

General Matrix Multiplication; a Level 3 BLAS operation highly optimized for cache locality.

New cards

Block-cyclic distribution

A method of distributing matrices across nodes that balances computational load while keeping local blocks for efficient BLAS operations.

New cards

Ghost atoms

Copies of atoms belonging to neighboring ranks used in MPI molecular dynamics to compute interactions across domain boundaries.

New cards

Kokkos

A C++ performance-portability library designed to map parallel execution patterns to different backends like CUDA or OpenMP.

New cards

Operational intensity

A metric defined as floating-point operations per byte of DRAM traffic after cache filtering.

New cards

Ridge point

The value on a roofline plot calculated as $\frac{\text{peak FLOP/s}}{\text{peak memory bandwidth}}$ , representing the threshold between memory-bound and compute-bound regimes.