Lecture 4 discusses thread-based parallelism in Python, including how threads work, how to create them, and thread synchronization techniques [1, 2].
Key aspects covered in the lecture:
Thread-Based Parallelism in Python:
Multithreading splits a program into concurrent, independent tasks that operate simultaneously [1].
While multithreading can speed up programs, its primary goal is concurrency [1].
Challenges include handling synchronization and deadlocks to ensure smooth program execution [2].
A thread is an independent execution flow that runs in parallel and concurrently with other threads [3].
Thread programming is a shared memory paradigm where threads can share data and resources, with variables accessible to all threads within the same process via a shared space [3]. Each thread has its own program counters, registers, and stack space but shares data and resources with other threads from the same process [3, 4].
States of Threads:
A thread is created and enters a ready state [4].
When started, the OS schedules it, and it begins running [4].
If a thread waits for a condition, it enters a blocked state and returns to the ready state when unblocked [4, 5].
Once execution is complete, the thread is terminated [5].
Creating Threads:
The threading.Thread() object allows concurrent execution of multiple functions [5].
Key parameters include:
target: The callable function to be executed [5, 6].
args: A tuple of arguments passed to the target function [5, 6].
kwargs: A dictionary of keyword arguments for the target function [5, 6].
.start(): Sets the thread for scheduling and execution [7].
.join(): Makes the program wait for the thread to complete its task before moving to the next line of code [7].
Threads can be named, although using Python's default naming is often better, especially with multiple processes [6, 8]. You can use threading.current_thread().name to call the name of a thread [8, 9].
Threads can also be created using object-oriented programming by creating a child class that inherits from the Thread class, overriding the constructor (__init__) and the execution method (run) [9].
Thread Synchronization:
Thread synchronization ensures threads work together smoothly by preventing them from interfering with each other when accessing shared data or resources [10].
Mechanisms include Locks, RLocks, Semaphores, Conditions, and Events [11].
Locks:
A lock is an object a thread must acquire before accessing a protected section, ensuring only one thread accesses the code at a time [12].
Threads use acquire() to request the lock and release() to relinquish it [12, 13].
Locks are useful in scenarios like e-commerce checkout systems, reservation systems, and operating systems to prevent conflicts [14, 15].
RLocks (Re-entrant Locks):
An RLock allows a thread to acquire the same lock multiple times, preventing deadlocks [16].
It tracks the number of acquisitions and requires an equal number of releases to be fully released [16].
RLocks are useful in recursive functions or complex operations like undo/redo functionality in collaborative editing [17-19].
Semaphores:
A semaphore controls access to a common resource by multiple threads, limiting the number of concurrent threads in a section [20].
Threads must acquire a permit to enter a critical section and release it upon exit [20, 21].
Semaphores are useful for managing network connection pools and database access control [22, 23].
Conditions:
A condition variable allows threads to wait for certain conditions to be met before proceeding [24].
Threads wait for a condition to become true and are notified when the state changes [24].
Conditions are ideal for producer-consumer scenarios and inter-thread communication [25, 26].
Events:
An event object communicates the occurrence of an event between threads, where threads wait until an event is set by another thread [26].
Threads can set ( event.set() ) or clear ( event.clear() ) an internal flag and test it with is_set() , while other threads wait ( event.wait() ) until the event is set [27].
Events are useful for signaling state changes or triggering actions in other threads, such as asynchronous task initialization or pause/resume operations [27-29].
Barriers:
A barrier is a synchronization mechanism that makes multiple threads wait until a predefined number of threads have reached the barrier point before proceeding [29].
Once all threads have called barrier.wait(), they are simultaneously released to continue execution [30].
Barriers are useful in parallel data processing, scientific computing, and multi-stage pipeline processing [31, 32].
Queues:
Queues are data structures that follow the First-In-First-Out (FIFO) principle, ensuring thread safety and maintaining order [32, 33].
The queue.Queue class in Python is thread-safe and automatically handles locking [34].
Essential methods include put(item) to add an item, get() to remove an item, task_done() to indicate task completion, and join() to block until all items are processed [34, 35].
Queues are ideal for managing information flow between producer and consumer threads, as well as in web server request handling, task scheduling systems, and logging systems [36-38].
Choosing a Synchronization Mechanism: The lecture concludes by noting the importance of choosing the correct mechanism for each application [39].