3.3 Motivation

  • Most modern applications are multithreaded
  • Threads run within application
  • Multiple tasks with the application can be implemented by separate threads
    • Examples: Update display, Fetch data, Spell checking, Answer a network request
  • Process creation is heavy-weight while thread creation is light-weight
  • Can simplify code, increase efficiency
  • Kernels are generally multithreaded

3.4 Single and Multithreaded Processes

  • (Slide title present; content not detailed in transcript beyond the heading)

3.6 Benefits

  • Responsiveness
    • May allow continued execution if part of a process is blocked, especially important for user interfaces
  • Resource Sharing
    • Threads share resources of the process; easier than shared memory or message passing
  • Economy
    • Cheaper than process creation; thread switching has lower overhead than context switching
  • Scalability
    • A thread-enabled process can take advantage of multicore architectures

3.7 Multicore Programming

  • Multicore or multiprocessor systems put pressure on programmers; key challenges include:
    • Dividing activities
    • Balance
    • Data splitting
    • Data dependency
    • Testing and debugging
  • Parallelism implies a system can perform more than one task simultaneously
  • Concurrency supports more than one task making progress
  • On a single processor/core, a scheduler provides concurrency

3.8 Concurrency vs. Parallelism

  • Concurrent execution on a single-core system
  • Parallelism on a multi-core system

3.9 Multicore Programming — Types of parallelism

  • Data parallelism
    • Distributes subsets of the same data across multiple cores; same operation on each
  • Task parallelism
    • Distributes threads across cores; each thread performing a unique operation

3.10 Data and Task Parallelism

  • Distinction between data parallelism and task parallelism

3.11 Amdahl’s Law

  • Identifies performance gains from adding additional cores to an application that has both serial and parallel components
  • S is the serial portion
  • N processing cores
  • Example: If an application is 75% parallel / 25% serial, moving from 1 to 2 cores results in speedup of 1.6 times
  • Speedup formula:
    Speedup(N)=1S+1SN\text{Speedup}(N) = \frac{1}{S + \frac{1 - S}{N}}
  • As N approaches infinity, speedup approaches 1S\frac{1}{S}
  • Serial portion of an application has disproportionate effect on performance gained by adding additional cores
  • Question: Does the law take into account contemporary multicore systems?
  • Example numeric illustration (from figure): with different serial fractions, speedup vs. number of cores shows limiting behavior as cores increase
  • For an illustration, if S = 0.25 (i.e., 25% serial), then with large N the speedup is limited by 1/S = 4

3.12 (Amdahl’s Law illustration)

  • Graphical illustration of speedup vs. number of cores for various serial fractions (e.g., S = 0.05, 0.10, 0.50)
  • Ideal speedup line contrasts with actual speedups constrained by serial portion
  • Note: The original transcript shows a figure with values; the textual takeaway is the same as the formula above

3.13 User Threads and Kernel Threads

  • User threads
    • Management done by user-level threads library
  • Three primary thread libraries:
    • POSIX Pthreads
    • Windows threads
    • Java threads
  • Kernel threads
    • Supported by the Kernel
  • Examples of operating systems supporting kernel threads: Windows, Linux, etc.
  • Examples listed: Windows, Linux, Mac OS X, iOS, Android

3.14 User and Kernel Threads

  • (Slide title present; content summarized under 3.13 above)

3.15 Multithreading Models

  • Many-to-One
  • One-to-One
  • Many-to-Many

3.16 Many-to-One

  • Many user-level threads mapped to a single kernel thread
  • One thread blocking causes all to block
  • Multiple threads may not run in parallel on multicore systems because only one may be in kernel at a time
  • Few systems currently use this model
  • Examples:
    • Solaris Green Threads
    • GNU Portable Threads

3.17 One-to-One

  • Each user-level thread maps to a kernel thread
  • Creating a user-level thread creates a kernel thread
  • More concurrency than many-to-one
  • Number of threads per process sometimes restricted due to overhead
  • Examples: Windows, Linux

3.18 Many-to-Many Model

  • Allows many user level threads to be mapped to many kernel threads
  • Allows the operating system to create a sufficient number of kernel threads
  • Example: Windows with the ThreadFiber package
  • Otherwise not very common

3.19 Two-level Model

  • Similar to M:M, except that it allows a user thread to be bound to a kernel thread

3.20 Implicit Threading

  • Growing in popularity as numbers of threads increase
  • Program correctness becomes more difficult with explicit threads
  • Creation and management of threads done by compilers and run-time libraries rather than programmers
  • Five methods explored:
    • Thread Pools
    • Fork-Join
    • OpenMP
    • Grand Central Dispatch
    • Intel Threading Building Blocks

3.21 Thread Pools

  • Create a number of threads in a pool where they await work
  • Advantages:
    • Usually slightly faster to service a request with an existing thread than create a new thread
    • Allows the number of threads in the application(s) to be bound to the size of the pool
    • Separating task to be performed from mechanics of creating task allows different strategies for running tasks (e.g., tasks could be scheduled to run periodically)
  • Windows API supports thread pools

3.22 Fork-Join Parallelism

  • Multiple threads (tasks) are forked, and then joined

3.23 Threading Issues

  • Semantics of fork() and exec() system calls
  • Signal handling
    • Synchronous and asynchronous
  • Thread cancellation of target thread
    • Asynchronous or deferred
  • Thread-local storage
  • Scheduler Activations

3.24 Process Creation (Cont.)

  • UNIX examples:
    • fork() system call creates new process
    • exec() system call used after a fork() to replace the process’ memory space with a new program
    • Parent process calls wait() waiting for the child to terminate

3.25 Semantics of fork() and exec()

  • Does fork() duplicate only the calling thread or all threads?
  • Some UNIXes have two versions of fork
  • exec() usually works as normal – replaces the running process including all threads

3.26 Signal Handling

  • Signals are used in UNIX systems to notify a process that a particular event has occurred
  • A signal handler is used to process signals
    1. Signal is generated by particular event
    2. Signal is delivered to a process
    3. Signal is handled by one of two signal handlers: default or user-defined
  • Every signal has default handler that kernel runs when handling signal
  • User-defined signal handler can override default
  • For single-threaded, signal delivered to process

3.27 Signal Handling (Cont.)

  • Where should a signal be delivered for multi-threaded?
    • Deliver the signal to the thread to which the signal applies
    • Deliver the signal to every thread in the process
    • Deliver the signal to certain threads in the process
    • Assign a specific thread to receive all signals for the process

3.28 Thread Cancellation

  • Terminating a thread before it has finished
  • Target thread
  • Two general approaches:
    • Asynchronous cancellation terminates the target thread immediately
    • Deferred cancellation allows the target thread to periodically check if it should be cancelled

3.29 Thread-Local Storage

  • Thread-local storage (TLS) allows each thread to have its own copy of data
  • Useful when you do not have control over the thread creation process (i.e., when using a thread pool)
  • Different from local variables
    • Local variables visible only during single function invocation
    • TLS visible across function invocations
  • Similar to static data
  • TLS is unique to each thread

3.30 Operating System Examples

  • Windows Threads
  • Linux Threads

End