Threads and Concurrency
Chapter 4: Threads and Concurrency
1. Introduction
1.1 Need for Concurrency
A machine may possess multiple processors.
A sequential program can only utilize one processor at any given time.
A program often requires the execution of multiple independent but related tasks.
Example: A word processor must handle formatting, conducting spell checks, saving to disk, etc.
A sequential program for this scenario would be excessively convoluted, making it hard to comprehend and modify.
1.2 Achieving Concurrency
Concurrency can be achieved through the following methods:
Utilizing multiple processes
Different processes can execute on separate processors and/or perform distinct tasks.
Processes must share resources, such as data.
Example: Processes in a word processor need to share buffer space.
Inter-process Communication (IPC) is essential.
Among IPC mechanisms, shared memory is identified as the most efficient.
Once setup, no system calls are required for access.
Eliminates the need for data copying between buffers.
1.3 Concurrency using Processes
While process-based concurrency solves several issues, notable disadvantages include:
Necessity for explicit configuration to share memory between processes.
The creation and destruction of a process incurs significant costs.
The operating system must allocate many resources to enable execution of new processes.
The act of switching between processes (context switching) is resource-intensive.
This process requires saving the state of one process while loading the state of another.
This is particularly critical in uniprocessor systems, where all processes vie for the same CPU.
Such overhead leads to suboptimal resource utilization resulting in increased execution time and longer response times for applications.
1.4 Can We Do Better?
Approaches to improve concurrency models involve:
Allowing related processes to share resources.
Reconfiguring address space while also maintaining separation between unrelated processes.
Facilitating multiple execution flows within a single address space.
Each of these execution flows is termed a thread.
Initially, threads were referred to as tasks.
Resources Consumed by Threads:
Thread-specific resources:
Program counter
Instruction register
Condition code register
Stack pointer
Stack
General-purpose CPU registers
Process-specific resources:
Address space
Code
Data
Heap
Memory
Cache
2. Threads
2.1 Evolution of Threads: Inception
Early programming languages, including C/C++, and kernel operations did not provide support for threads.
Multi-threaded programming was achievable only through third-party libraries, e.g., POSIX (Portable Operating System Interface).
At this stage, kernels had no awareness of multiple threads within a single process.
Threads were handled entirely in user space (the memory area accessible to user processes), leading to the term User Level Threads (ULT) for such implementations.
2.2 Evolution of Threads: Now
With the proliferation of threads came enhanced support for multi-threaded programming.
The demand for such support surged with the advent of multi-core machines.
Kernels have evolved to recognize multiple threads operating within a process.
Built-in support for multi-threading was introduced in C/C++ as of 2011.
Java has provided support for multi-threading since its inception.
In many operating systems today, threads are created, destroyed, and managed within kernel space, referred to as Kernel Level Threads (KLT).
Each thread is maintained via a Thread Control Block (TCB), akin to the Process Control Block (PCB) for processes.
2.3 User Level Threads: Pros and Cons
Advantages:
Low Overhead:
Threads are cheaper to create, destroy, and manage with no mode switch required.
Flexible Scheduling:
Thread scheduling can be customized to meet the specific needs of applications, enhancing performance.
Disadvantages:
No Parallelism:
Only one thread of a process can operate at any one time, limiting CPU utilization, particularly in multi-CPU environments.
Dependent Progress:
If one thread encounters a blocking operation, the entire process is stalled, leading to inefficiencies articulated as "One for All and All for One!".
2.4 Kernel Level Threads: Pros and Cons
Advantages:
Optimal CPU Utilization:
Each thread can potentially run on a separate CPU processor.
Independent Progress:
Blocking operations affecting one thread do not impede the execution of other threads.
Achievable resolution characterized by "To Each its Own!".
Disadvantages:
High Overhead:
The management of kernel threads incurs substantial costs from creation, destruction, and maintenance.
Mode Switch Requirement:
Each operation necessitates two mode switches, adding latency to thread execution.
2.5 Thread Abstraction Models
Many-to-One:
User-level threads exist wherein a single process has only user-level threads.
One-to-One:
Kernel-level threads exist where a single process employs only kernel-level threads.
Many-to-Many (Hybrid):
A single process employs both user-level and kernel-level threads:
M user-level threads are mapped to N kernel threads, where M ≥ N.
Applications can choose from:
Functionality of kernel-level threads (allowing concurrency and independent progress),
Performance of user-level threads,
A hybrid mix of both,
Scheduling primarily occurs in user space via a mechanism called scheduler activation, considered initially difficult for both OS designers and application programmers, and largely deprecated in use.
3. Important Issues
3.1 Issues when Implementing Threads
Semantics of fork and exec System Calls:
When one thread invokes fork, should the new process duplicate all threads or solely the invoking thread (specific case)?
Commonly, in Unix/Linux environments, a process invokes fork, followed by the immediate invocation of exec in the newly created process.
This leads to inefficiency if all threads are copied only to be destroyed immediately after.
Many Unix/Linux implementations present variants of the fork operation to address this.
Signal Handling:
Should a signal be delivered to a specific thread or all threads of a process?
Alternatives include delivering it to:
All threads in the process.
Certain threads designated for the signal.
A specific thread designated for that signal.
The thread directly relevant to the signal context (e.g., illegal instruction).
The optimal approach often depends on the signal type, with Unix/Linux variants characterizing signal handlers at the process level, while the signal mask is thread-specific.
Thread Cancellation:
Threads can be canceled when no longer needed.
Example: If threads are scanning different parts of a state space and a solution is found, the irrelevant threads may be terminated.
Two forms of cancellation exist:
Asynchronous:
Immediate thread cancellation.
Deferred:
Threads receive a notification to cancel itself, using a flag periodically checked by the thread.
This is often preferred to avoid data corruption, especially if the thread is in the middle of modifying shared data.
Thread-Local Storage (TLS):
Each thread may require its own distinct copy of select variables.
Advantages:
Reduces programming errors.
Boosts performance allowing for compiler-level optimizations.
TLS data is stored as static data, persisting beyond the thread's termination and potentially being accessible to other threads.
3.2 Optimizations
Thread Pools:
Pre-create a collection of worker threads, selecting free threads to handle specific tasks as needed.
Enhances application performance particularly in instances where new tasks are generated at runtime.
Eliminates overhead from creating and destroying threads during runtime execution.
Grand Central Dispatch (GCD):
Apple's implementation of thread pools featuring additional optimizations aimed at facilitating more efficient management of concurrent operations.