1/84
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Thread of control
A sequence of instructions that is executed one instruction at a time, one after the other, during the execution of a program.
Thread
The set of all resources necessary to represent a single thread of control through a program's executable code.
Resources
Each thread within that process has its own:
- Unique thread ID
- Program counter (PC)
- Register set
- User stack
All threads share many process resources, but the most important are:
- the executable code, i.e. the text segment
- the heap memory, where dynamically allocated data is stored
- the data segments, containing initialized and uninitialized data
- the command line arguments, environment variables, and open files
Reasons for multi-threading
- code to handle async events can be executed by a separate thread
- threads automatically have access to the same memory address space
- on a multiprocessor architecture, many threads may run in parallel on separate cores, speeding up the computation
- performance can be improved by putting calls to system functions with expected long waits into separate threads
- by handling I/O in separate threads, the application can respond quickly, even when time-consuming operations are taking place in other threads
Concurrent
Two threads are ___ if their computations can overlap in time.
Parallel
Two threads execute in ___ if they execute at the same time on different processors.
Parallel program
Contains instruction sequences that can be executed in parallel
Concurrent program
If a parallel program is executed on a single processor, the threads within it can be interleaved in time on the processor. Therefore contains concurrency.
Challenges in Parallel Programming
Functional decomposition, load balancing, data decomposition, data dependency analysis, testing and debugging
Functional decomposition
There is no algorithm to decompose a single task into multiple, independent, parallel subtasks.
Load balancing
Decomposing a problem into subtasks in such a way that each has roughly the same amount of computation, to maximize each core's utilization.
Data decomposition
Dividing the data among the subtasks to minimize IPC and divide the work equally, and to do so in a scalable way.
Data dependency analysis
When data is distributed to the various subtasks, identifying which data in each subtask depends on the computation performed by a different subtask.
Testing and debugging
Unlike sequential software, parallel programs can have errors that depend on the relative rates of execution of the subtasks. Designing tests is very difficult and debugging when failure occur is even harder.
Data parallelism
Exists when a task can be decomposed into many subtasks that perform identical operations on different sets of data.
Task parallelism (functional parallelism)
Exists when a task can be decomposed into multiple subtasks such that each performs a different function, on the same or different data sets.
Speed-up
The ___ of a parallel program with respect to a sequential program on a computer with N identical processors with the exact same input data is the sequential program's running time on one processor divided by the parallel program's running time on N processors.
Amdahl's law
Every program has some fraction of operations in it that must be executed sequentially.
Serial fraction
Let f be the fraction of operations in a sequential program's computation on a given input that are inherently sequential. f is often called the ___.
Parallel fraction
Let N be the number of processing cores on the computer on which a parallel version of this same program is run. The speed-up of this parallel version using all N processors has an upper bound given by the formula "speedup <= (1/(f+((1-f)/N)))" The value (1-f) is defined as the ___. This represents the fraction of code that can be run in parallel.
Multithreading models
Two different methods (thread implemented in user space and in kernel space) by which computer systems allow programmers to create multithreaded programs
User level threads
Threads that are managed by a user level thread library. The library contains code for creating and destroying threads, for inter-thread communication, scheduling, saving and restoring thread contexts, and all other thread management operations.
Three most commonly used thread libraries
POSIX Threads, Windows Threads, and Java Threads
Kernel level threads
Threads that are managed entirely within the kernel. There is no thread management code in user space. Creating, destroying, scheduling, coordinating, and otherwise managing threads is performed completely within kernel code. Sometimes called a light weight process (LWP).
POSIX threads or Pthreads
One of the most common user thread libraries standardized by POSIX. Implemented in almost all modern operating systems.
pthread_exit()
Terminates the calling thread.
pthread_create()
Creates a new thread, called the child thread. The caller is the parent thread.
pthread_join()
Makes the main program wait until the child thread makes a call to pthread_exit().
User level threads: pros
- Threads can be created very quickly, depending on the underlying implementation. Few, if any, system calls are needed.
- Switching from one thread to another is also very fast since no context switch is required (bcuz all threads are in user space).
- The kernel does not need to have any support for threads for user programs to be multithreaded. All threads run in the process's context.
- Because library is in user space, code written to run against its API can be run on any computer system for which the library has been implemented.
User level threads: cons
- A process with dozens of threads may get the same amount of time on the processor as one with a single thread, depending on the implementation, so the fact that it has many threads does not give it more processor time.
- A program with multiple threads may not be able to take advantage of multiple processors, since all threads may be mapped to a single processor, depending on the implementation.
- The application programmer generally has no control over how threads are mapped to processors.
Kernel threads
Threads that run as part of the kernel, in its address space. (Inside the kernel space) Not the same thing as kernel level threads.
Tasks
Processes and threads to the Linux kernel, both represented by a task_struct. (Threads can share resources, such as their address space, whereas processes do not.)
'flags' parameter
Has many possible values that tell the kernel which resources are shared.
malloc()
Returns the starting address of the stack plus the stack size.
clone()
Tells the kernel that the parent's memory is to be shared with the child rather than copied.
SIGCHLD
Tells the kernel that the child will call exit() and the parent is going to wait() for the ___ signal to receive the child's termination status.
Kernel level threads: pros
- Kernel level threads from a single process can be scheduled simultaneously on multiple processors, taking advantage of the hardware.
- A thread that blocks as a result of a service request does not prevent other threads in the same process from being scheduled to run.
- The kernel can allocate more processor time to processes with larger numbers of threads.
- The kernel itself can be multithreaded.
Kernel level threads: cons
- It is slower to create kernel level threads and more work to manage them because there is more work in the kernel.
- Switching between threads in the same process requires kernel intervention and is therefore slower.
- Representing threads within the kernel requires a complete PCB.
Implementation of user level threads
The many-to-one model. If the kernel has support for threading: the one-to-one model and the many-to-many model
Many-to-one (M:1)
Requires no support for multithreading at the kernel level and is the most portable, but it has significant performance problems. All threads are implemented in user space within the process, which appears to the kernel to be an ordinary, single-threaded process.
Many-to-one (M:1): pros
It is portable and does not require any support from the underlying kernel.
Many-to-one (M:1): cons
- One drawback is that it requires all blocking system calls to be simulated in the library by non-blocking calls to the kernel, which slows down system calls significantly.
- A second is a consequence of the first. Some blocking system calls cannot be simulated by non-blocking system calls; as a result, when a thread makes such a call the entire process is blocked.
- A major con is that a program cannot take advantage of more than a single processor because the kernel sees it as a single-threaded process (as a single schedulable unit).
One-to-one (1:1)
Assigns a kernel level thread to each user level thread. Implementations of Pthreads in current Linux versions and Windows systems use this approach. Each thread is seen by the kernel as a separately schedulable entity.
One-to-one (1:1): pros
- It is the simplest to implement within a library.
- Provides the greatest possible concurrency because it can use as many processors as are available, up to the number of threads.
- One thread's blocking does not block other threads.
One-to-one (1:1): cons
- Creating each thread is more expensive in terms of kernel resources.
- Thread management is slower because most operations on threads require system calls.
- It is dependent on the multithreading model of the underlying kernel.
Many-to-many (M:N)
Most flexible of the models. The library creates multiple kernel level threads and schedules user level threads on top of them. Most libraries will dynamically allocate as many kernel level threads as necessary to service the user level threads that are ready to run.
Many-to-many (M:N): pros
- It does not user kernel resources for user level threads that are not actually runnable.
- The library-level scheduler can switch between threads much faster because it does not make system calls.
- It performs better than the others when user level threads synchronize with each other.
- Application that create large number of threads that only run occasionally perform better on this model than in the others.
Many-to-many (M:N): cons
- It has more overhead and consumes more system resources because scheduling takes place in both the kernel and among the kernel level threads for the process and in the library for the usre level threads.
- User level threads that are bound to the same kernel level thread can still be blocked when the thread that is running makes a system call.
Scheduler activations
The kernel provides the process with the abstraction of virtual processors.
Virtual processors
Acts like an actual processor on which a user level thread can run. Just a data structure, sometimes called Light Weight Process (LWP).
Upcalls
Kernel events that could affect the number of runnable threads, such as blocking system calls, are communicated directly to the user process using ____, which are messages sent to the user level thread manager.
Upcall handler
Handles upcalls within the thread manager (like a signal handler). Must run on a virtual processor.
Scheduler activation use example sequence
1. An executing thread makes a blocking system call.
2. The kernel blocks the calling user level thread as well as the kernel level thread used to execute the user level thread.
3. Scheduler activation: The kernel allocates a new virtual processor to the process.
4. Upcall: The kernel notifies the user level thread manager that the user level thread blocked and that a new virtual processor is available for other threads.
5. The user process runs an upcall handler on the new virtual processor. It saves the state of the blocked thread and frees the virtual processor on which that thread was running.
6. The user level thread manager moves the other threads to the new virtual processor and schedules one of the ready threads to run on it.
Implicit threading
Refers to any of several different methods of multithreading a program in which compilers and/or libraries create and manage concurrent threads with little or no explicit guidance from the programmer.
Overcome difficulties of writing multi threaded programs
- offload some of the programming tasks from developers to compilers and run-time libraries.
- implicit in implicit threading is the creation and management of threads, not the specification of parallelism.
Parallelizing compilers
Analyze the code and determine automatically where fine-grained parallelism exists. (Ex: some C++ and Fortran compilers)
Well-known implicit threading systems
OpenMP, Grand Central Dispatch, Intel Threading Building Blocks, Java Concurrency
OpenMP (Open Multi-Processing)
An API for programs written in C/C++ and FORTRAN that may be used to explicitly specify multithreaded, shared memory parallelism. It includes compiler directives, runtime library routine, and environment variables.
Grand Central Dispatch (GCD)
A technology developed by Apple for its macOS and iOS operating systems. Like OpenMP, it includes a run-time library, an API, and language extensions that allow developers to identify sections of code to run in parallel.
Intel Threading Building Blocks (TBB)
A template library that supports the design of multithreaded, shared-memory programs in C++.
Java Concurrency
Refers to the multithreading, concurrency and parallelism available from the Java Virtual Machine, which is entirely thread-based. java.util.concurrent is the class that provides this concurrency.
Worker thread
Rather than constantly creating and deleting threads, the implementation can maintain a pool of threads, like a collection of workers sitting in a room waiting to be assigned work to do. When work comes in, it gets assigned to it. When it finishes, it goes back to the waiting room.
Thread pools
To improve performance of implicit threading systems.
- A thread pool is initialized with a number of threads, where they wait for work.
- When a process needs a new thread to perform a task, it requests one from the thread pool.
- If there is an available thread in the pool, it is awakened and assigned to the process to execute the task.
- If the pool contains no available threads, the task is queued until one becomes free.
- Once a thread is available, it return to the pool and awaits more work.
Benefits of thread pools
- Servicing a request with an existing thread is faster than creating a thread.
- A thread pool limits the number of threads that exist at any one point.
- Separating the task to be performed from the mechanics of creating the task means different strategies can be used for running and scheduling the task.
Fork-join model/parallelism
Threads are created by a parent thread, and their executions reach a join point after which only one thread continues.
OpenMP Overview
- Designed for multiprocessor/core, shared memory machines. The underlying architecture can be either UMA or NUMA.
- Has 3 primary API components: Compiler directives (used by programmers to define parallel regions), Runtime library routines (which extend the language with OpenMP functions), and Environment variables (used to control the behavior of OpenMP programs)
- Parallelism is exclusively through the use of threads.
- Uses fork-join model of parallel execution.
Master thread
Executes sequentially until the first parallel region is encountered.
OpenMP key features
- Defines parallel regions as blocks of code that may run in parallel.
- Programmers insert compiler directives into their code at parallel regions; these directives instruct the OpenMP run-time library to execute the region in parallel.
- Parallel regions are specified using compiler directives, known as pragmas. (#pragma omp parallel) which specifies that the following statement is to be executed by some number of threads in parallel.
- Most data within a parallel region is shared by default (cuz its a shared memory programming model)
- #pragma omp parallel shared(a) private(i) - a C/C++ pragma that specifies the start of a parallel region with a variable 'a' shared by all threads and a thread-private variable named 'i' of which each thread has a private copy.
OpenMP Run-time Routines
The run-time library has routines for such things as:
- setting and querying the number of threads
- querying a thread's unique identifier (thread ID)
- querying if execution is within a parallel region, and at what level
- setting and querying nested parallelism
- setting, initializing and terminating locks and nested locks
- querying clock time
Grand Central Dispatch (GCD) Overview
- Provides support for programs written in various languages, including C, C++, Objective-C, and Swift.
- An implementation of task parallelism based on thread pools. Library manages the threads without the programmer's involvement. Programmer specifies the tasks that can be run in parallel.
- Task can be expressed either as a function or as a block.
- Queues the designated tasks for execution and schedules them onto an available thread to run on any available processor.
- Underlying support for libdispatch is the POSIX threads library, but most of the support comes from non-POSIX compliant Apple extensions.
Block
An extension to the syntax of C, C++, and Objective-C that encapsulates code and data into a single object. A block is specified by a caret ^ inserted in front of a pair of braces { }: ^{ print("block"); }
Signal
An empty message sent to a process because an event occurred. It has type and nothing more. May be received either synchronously or asynchronously.
Synchronous signal examples
Traps and other exceptions
Async signal examples
Timer interrupts (SIGALRM), keyboard interrupts Control-C (SIGINT), and terminal disconnections (SIGHUP)
Sequence of actions (with any type of signal)
1. A signal is generated by some event.
2. The signal is sent to a process by the kernel.
3. The signal is pending until the next step.
4. The signal is delivered to the process when the process takes some action with respect to it, which is either performing the default action, ignoring it, or catching the signal with a signal handler. The disposition of the signal is how the process behaves when the signal is delivered.
Signal handler
A function that is run when a signal is delivered. Signal handlers are registered with the kernel. If there is no user-defined signal handler, a default action is taken.
Pending flag
Indicates whether a signal type is pending or not.
Blocked flag
Indicates whether a signal type is blocked or not.
Thread cancellation
Act of terminating a thread that has not yet terminated itself. With it, one thread can try to terminate another.
Canceled thread
A thread if it is terminated.
Target thread
The thread to be canceled.
Asynchronous cancellation
A thread can cancel the target thread immediately.
Deferred cancellation
A thread attempts to cancel the target thread, but the target thread is not canceled immediately. Instead, it terminates when it reached a point in its execution when it is safe to do so.
Cancellation points
Calls to selected functions. When the thread calls one of these functions, it terminates.