Untitled Notes

3.3 Motivation

Most modern applications are multithreaded
Threads run within application
Multiple tasks with the application can be implemented by separate threads
- Examples: Update display, Fetch data, Spell checking, Answer a network request
Process creation is heavy-weight while thread creation is light-weight
Can simplify code, increase efficiency
Kernels are generally multithreaded

3.4 Single and Multithreaded Processes

(Slide title present; content not detailed in transcript beyond the heading)

3.6 Benefits

Responsiveness
- May allow continued execution if part of a process is blocked, especially important for user interfaces
Resource Sharing
- Threads share resources of the process; easier than shared memory or message passing
Economy
- Cheaper than process creation; thread switching has lower overhead than context switching
Scalability
- A thread-enabled process can take advantage of multicore architectures

3.7 Multicore Programming

Multicore or multiprocessor systems put pressure on programmers; key challenges include:
- Dividing activities
- Balance
- Data splitting
- Data dependency
- Testing and debugging
Parallelism implies a system can perform more than one task simultaneously
Concurrency supports more than one task making progress
On a single processor/core, a scheduler provides concurrency

3.8 Concurrency vs. Parallelism

Concurrent execution on a single-core system
Parallelism on a multi-core system

3.9 Multicore Programming — Types of parallelism

Data parallelism
- Distributes subsets of the same data across multiple cores; same operation on each
Task parallelism
- Distributes threads across cores; each thread performing a unique operation

3.10 Data and Task Parallelism

Distinction between data parallelism and task parallelism

3.11 Amdahl’s Law

Identifies performance gains from adding additional cores to an application that has both serial and parallel components
S is the serial portion
N processing cores
Example: If an application is 75% parallel / 25% serial, moving from 1 to 2 cores results in speedup of 1.6 times
Speedup formula:
$\text{Speedup}(N) = \frac{1}{S + \frac{1 - S}{N}}$
As N approaches infinity, speedup approaches $\frac{1}{S}$
Serial portion of an application has disproportionate effect on performance gained by adding additional cores
Question: Does the law take into account contemporary multicore systems?
Example numeric illustration (from figure): with different serial fractions, speedup vs. number of cores shows limiting behavior as cores increase
For an illustration, if S = 0.25 (i.e., 25% serial), then with large N the speedup is limited by 1/S = 4

3.12 (Amdahl’s Law illustration)

Graphical illustration of speedup vs. number of cores for various serial fractions (e.g., S = 0.05, 0.10, 0.50)
Ideal speedup line contrasts with actual speedups constrained by serial portion
Note: The original transcript shows a figure with values; the textual takeaway is the same as the formula above

3.13 User Threads and Kernel Threads

User threads
- Management done by user-level threads library
Three primary thread libraries:
- POSIX Pthreads
- Windows threads
- Java threads
Kernel threads
- Supported by the Kernel
Examples of operating systems supporting kernel threads: Windows, Linux, etc.
Examples listed: Windows, Linux, Mac OS X, iOS, Android

3.14 User and Kernel Threads

(Slide title present; content summarized under 3.13 above)

3.15 Multithreading Models

Many-to-One
One-to-One
Many-to-Many

3.16 Many-to-One

Many user-level threads mapped to a single kernel thread
One thread blocking causes all to block
Multiple threads may not run in parallel on multicore systems because only one may be in kernel at a time
Few systems currently use this model
Examples:
- Solaris Green Threads
- GNU Portable Threads

3.17 One-to-One

Each user-level thread maps to a kernel thread
Creating a user-level thread creates a kernel thread
More concurrency than many-to-one
Number of threads per process sometimes restricted due to overhead
Examples: Windows, Linux

3.18 Many-to-Many Model

Allows many user level threads to be mapped to many kernel threads
Allows the operating system to create a sufficient number of kernel threads
Example: Windows with the ThreadFiber package
Otherwise not very common

3.19 Two-level Model

Similar to M:M, except that it allows a user thread to be bound to a kernel thread

3.20 Implicit Threading

Growing in popularity as numbers of threads increase
Program correctness becomes more difficult with explicit threads
Creation and management of threads done by compilers and run-time libraries rather than programmers
Five methods explored:
- Thread Pools
- Fork-Join
- OpenMP
- Grand Central Dispatch
- Intel Threading Building Blocks

3.21 Thread Pools

Create a number of threads in a pool where they await work
Advantages:
- Usually slightly faster to service a request with an existing thread than create a new thread
- Allows the number of threads in the application(s) to be bound to the size of the pool
- Separating task to be performed from mechanics of creating task allows different strategies for running tasks (e.g., tasks could be scheduled to run periodically)
Windows API supports thread pools

3.22 Fork-Join Parallelism

Multiple threads (tasks) are forked, and then joined

3.23 Threading Issues

Semantics of fork() and exec() system calls
Signal handling
- Synchronous and asynchronous
Thread cancellation of target thread
- Asynchronous or deferred
Thread-local storage
Scheduler Activations

3.24 Process Creation (Cont.)

UNIX examples:
- fork() system call creates new process
- exec() system call used after a fork() to replace the process’ memory space with a new program
- Parent process calls wait() waiting for the child to terminate

3.25 Semantics of fork() and exec()

Does fork() duplicate only the calling thread or all threads?
Some UNIXes have two versions of fork
exec() usually works as normal – replaces the running process including all threads

3.26 Signal Handling

Signals are used in UNIX systems to notify a process that a particular event has occurred
A signal handler is used to process signals
1. Signal is generated by particular event
2. Signal is delivered to a process
3. Signal is handled by one of two signal handlers: default or user-defined
Every signal has default handler that kernel runs when handling signal
User-defined signal handler can override default
For single-threaded, signal delivered to process

3.27 Signal Handling (Cont.)

Where should a signal be delivered for multi-threaded?
- Deliver the signal to the thread to which the signal applies
- Deliver the signal to every thread in the process
- Deliver the signal to certain threads in the process
- Assign a specific thread to receive all signals for the process

3.28 Thread Cancellation

Terminating a thread before it has finished
Target thread
Two general approaches:
- Asynchronous cancellation terminates the target thread immediately
- Deferred cancellation allows the target thread to periodically check if it should be cancelled

3.29 Thread-Local Storage

Thread-local storage (TLS) allows each thread to have its own copy of data
Useful when you do not have control over the thread creation process (i.e., when using a thread pool)
Different from local variables
- Local variables visible only during single function invocation
- TLS visible across function invocations
Similar to static data
TLS is unique to each thread

3.30 Operating System Examples

Windows Threads
Linux Threads

3.3 Motivation

3.4 Single and Multithreaded Processes

3.6 Benefits

3.7 Multicore Programming

3.8 Concurrency vs. Parallelism

3.9 Multicore Programming — Types of parallelism

3.10 Data and Task Parallelism

3.11 Amdahl’s Law

3.12 (Amdahl’s Law illustration)

3.13 User Threads and Kernel Threads

3.14 User and Kernel Threads

3.15 Multithreading Models

3.16 Many-to-One

3.17 One-to-One

3.18 Many-to-Many Model

3.19 Two-level Model

3.20 Implicit Threading

3.21 Thread Pools

3.22 Fork-Join Parallelism

3.23 Threading Issues

3.24 Process Creation (Cont.)

3.25 Semantics of fork() and exec()

3.26 Signal Handling

3.27 Signal Handling (Cont.)

3.28 Thread Cancellation

3.29 Thread-Local Storage

3.30 Operating System Examples

End