Process creation is heavy-weight while thread creation is light-weight
Can simplify code, increase efficiency
Kernels are generally multithreaded
3.4 Single and Multithreaded Processes
(Slide title present; content not detailed in transcript beyond the heading)
3.6 Benefits
Responsiveness
May allow continued execution if part of a process is blocked, especially important for user interfaces
Resource Sharing
Threads share resources of the process; easier than shared memory or message passing
Economy
Cheaper than process creation; thread switching has lower overhead than context switching
Scalability
A thread-enabled process can take advantage of multicore architectures
3.7 Multicore Programming
Multicore or multiprocessor systems put pressure on programmers; key challenges include:
Dividing activities
Balance
Data splitting
Data dependency
Testing and debugging
Parallelism implies a system can perform more than one task simultaneously
Concurrency supports more than one task making progress
On a single processor/core, a scheduler provides concurrency
3.8 Concurrency vs. Parallelism
Concurrent execution on a single-core system
Parallelism on a multi-core system
3.9 Multicore Programming — Types of parallelism
Data parallelism
Distributes subsets of the same data across multiple cores; same operation on each
Task parallelism
Distributes threads across cores; each thread performing a unique operation
3.10 Data and Task Parallelism
Distinction between data parallelism and task parallelism
3.11 Amdahl’s Law
Identifies performance gains from adding additional cores to an application that has both serial and parallel components
S is the serial portion
N processing cores
Example: If an application is 75% parallel / 25% serial, moving from 1 to 2 cores results in speedup of 1.6 times
Speedup formula: Speedup(N)=S+N1−S1
As N approaches infinity, speedup approaches S1
Serial portion of an application has disproportionate effect on performance gained by adding additional cores
Question: Does the law take into account contemporary multicore systems?
Example numeric illustration (from figure): with different serial fractions, speedup vs. number of cores shows limiting behavior as cores increase
For an illustration, if S = 0.25 (i.e., 25% serial), then with large N the speedup is limited by 1/S = 4
3.12 (Amdahl’s Law illustration)
Graphical illustration of speedup vs. number of cores for various serial fractions (e.g., S = 0.05, 0.10, 0.50)
Ideal speedup line contrasts with actual speedups constrained by serial portion
Note: The original transcript shows a figure with values; the textual takeaway is the same as the formula above
3.13 User Threads and Kernel Threads
User threads
Management done by user-level threads library
Three primary thread libraries:
POSIX Pthreads
Windows threads
Java threads
Kernel threads
Supported by the Kernel
Examples of operating systems supporting kernel threads: Windows, Linux, etc.
Examples listed: Windows, Linux, Mac OS X, iOS, Android
3.14 User and Kernel Threads
(Slide title present; content summarized under 3.13 above)
3.15 Multithreading Models
Many-to-One
One-to-One
Many-to-Many
3.16 Many-to-One
Many user-level threads mapped to a single kernel thread
One thread blocking causes all to block
Multiple threads may not run in parallel on multicore systems because only one may be in kernel at a time
Few systems currently use this model
Examples:
Solaris Green Threads
GNU Portable Threads
3.17 One-to-One
Each user-level thread maps to a kernel thread
Creating a user-level thread creates a kernel thread
More concurrency than many-to-one
Number of threads per process sometimes restricted due to overhead
Examples: Windows, Linux
3.18 Many-to-Many Model
Allows many user level threads to be mapped to many kernel threads
Allows the operating system to create a sufficient number of kernel threads
Example: Windows with the ThreadFiber package
Otherwise not very common
3.19 Two-level Model
Similar to M:M, except that it allows a user thread to be bound to a kernel thread
3.20 Implicit Threading
Growing in popularity as numbers of threads increase
Program correctness becomes more difficult with explicit threads
Creation and management of threads done by compilers and run-time libraries rather than programmers
Five methods explored:
Thread Pools
Fork-Join
OpenMP
Grand Central Dispatch
Intel Threading Building Blocks
3.21 Thread Pools
Create a number of threads in a pool where they await work
Advantages:
Usually slightly faster to service a request with an existing thread than create a new thread
Allows the number of threads in the application(s) to be bound to the size of the pool
Separating task to be performed from mechanics of creating task allows different strategies for running tasks (e.g., tasks could be scheduled to run periodically)
Windows API supports thread pools
3.22 Fork-Join Parallelism
Multiple threads (tasks) are forked, and then joined
3.23 Threading Issues
Semantics of fork() and exec() system calls
Signal handling
Synchronous and asynchronous
Thread cancellation of target thread
Asynchronous or deferred
Thread-local storage
Scheduler Activations
3.24 Process Creation (Cont.)
UNIX examples:
fork() system call creates new process
exec() system call used after a fork() to replace the process’ memory space with a new program
Parent process calls wait() waiting for the child to terminate
3.25 Semantics of fork() and exec()
Does fork() duplicate only the calling thread or all threads?
Some UNIXes have two versions of fork
exec() usually works as normal – replaces the running process including all threads
3.26 Signal Handling
Signals are used in UNIX systems to notify a process that a particular event has occurred
A signal handler is used to process signals
Signal is generated by particular event
Signal is delivered to a process
Signal is handled by one of two signal handlers: default or user-defined
Every signal has default handler that kernel runs when handling signal
User-defined signal handler can override default
For single-threaded, signal delivered to process
3.27 Signal Handling (Cont.)
Where should a signal be delivered for multi-threaded?
Deliver the signal to the thread to which the signal applies
Deliver the signal to every thread in the process
Deliver the signal to certain threads in the process
Assign a specific thread to receive all signals for the process
3.28 Thread Cancellation
Terminating a thread before it has finished
Target thread
Two general approaches:
Asynchronous cancellation terminates the target thread immediately
Deferred cancellation allows the target thread to periodically check if it should be cancelled
3.29 Thread-Local Storage
Thread-local storage (TLS) allows each thread to have its own copy of data
Useful when you do not have control over the thread creation process (i.e., when using a thread pool)
Different from local variables
Local variables visible only during single function invocation