1/30
Flashcards reviewing lecture notes on multiprocessing, parallelism, and GPU architecture.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Multiprocessor
Powerful computers created by connecting many existing smaller ones; modern processors contain cores; software must work with a variable number of processors; energy is a key design issue.
Task-level parallelism
Utilizing multiple processors by running independent programs simultaneously; also known as process-level parallelism.
Parallel processing program
A single program that runs on multiple processors simultaneously.
Cluster
A set of computers connected over a network that function as a single large multiprocessor.
Multicore Microprocessor
A microprocessor containing multiple processors (“cores”) in a single integrated circuit.
Shared Memory Processors (SMPs)
Multicores that share a single physical address space.
Why must programmers today care about parallel programming?
Sequential code now means slow code; to achieve performance, programs must be parallel.
Sequential vs. Concurrent Software
Compiler: programs are sequential (parsing, code generation). OS: programs are concurrent (cooperating processes, I/O events).
Intel Pentium 4 vs. Intel Core i7
Pentium 4 was a uniprocessor (single core), while the Core i7 is a multicore processor.
Why is it difficult to write parallel processing programs?
Need better performance or energy efficiency than sequential; improvements in uniprocessor designs improved sequential programs without programmer involvement.
Explain Amdahl’s Law
Describes the maximum speedup achievable from parallelizing a task, limited by the sequential portion of the task.
Strong Scaling
Speed-up achieved on a multiprocessor without increasing the size of the problem.
Weak Scaling
Speed-up achieved on a multiprocessor while increasing the size of the problem proportionally to the increase in the number of processors.
Flynn’s Taxonomy
Categorization of parallel hardware based on instruction streams and data streams: SISD, SIMD, MISD, MIMD.
SISD
Single Instruction, Single Data: Conventional uniprocessor.
SIMD
Single Instruction, Multiple Data: Operates on vectors of data.
MISD
Multiple Instruction, Single Data: Stream processor performing computations on a single data stream in a pipelined fashion.
MIMD
Multiple Instruction, Multiple Data: Separate programs that run on different processors; often programmed using SPMD (Single Program Multiple Data).
Vector Architecture
SIMD interpretation where data elements are collected from memory, operated on sequentially in registers using pipelined execution units, and results written back to memory.
Vector Registers
A key feature of vector architectures in which a set of registers each contain multiple data elements, enabling pipelined processing.
Hardware Multithreading
Increases processor utilization by switching to another thread when one thread is stalled.
Fine-Grained Multithreading
Switches between threads on each instruction; can hide throughput losses from both short and long stalls but slows down individual threads.
Coarse-Grained Multithreading
Switches threads only on expensive stalls; less likely to slow down individual threads but limited in ability to overcome throughput losses from shorter stalls.
Simultaneous Multithreading (SMT)
Uses the resources of a multiple-issue, dynamically scheduled pipelined processor to exploit thread-level parallelism and instruction-level parallelism.
Graphics Processing Unit (GPU)
Specialized processing hardware dedicated to problems common to computer graphics, containing hundreds of parallel floating-point units.
Complete Unified Device Architecture (CUDA)
Enables developers to write C-programs to execute on GPUs, making GPU capabilities accessible for non-graphical applications.
CUDA Thread
Programming primitive using which the compiler and hardware can group thousands of threads together for leveraging parallelism with a GPU.
Multithreaded SIMD Processor
The building block of a GPU architecture, which consists of a collection of SIMD processors, which is itself MIMD.
SIMD Thread
The machine object the hardware creates, manages, schedules, and executes is a thread of SIMD instructions
GPU local memory
On-chip memory that is local to each multithreaded SIMD processor and shared by the SIMD lanes within that processor, but not between processors.
GPU Memory or global memory
Off-chip DRAM shared by the whole GPU and all thread blocks.