Lecture 5: Process-Based Parallelism

Lecture 5 introduces process-based parallelism in Python, contrasting it with thread-based parallelism and detailing inter-process communication (IPC) and synchronization methods [1].

Key aspects covered in the lecture:

Process-Based Parallelism:
Processes are self-contained execution environments with their own memory space, offering greater isolation and independence compared to threads [1, 2].
This eliminates the risk of memory corruption inherent in threads, as processes do not share memory [2].
Process-based parallelism is suited for distributed systems where tasks run on different machines or cores without shared memory [2].
Process programming follows a distributed memory paradigm, meaning variables accessible to one process are not accessible to others [3].
Handling Processes:
The multiprocessing.Process object allows running multiple functions concurrently [4].
Key arguments when instantiating a process [5]:
group: The process's group.
target: The callable function to be executed.
name: The process's name (default is "Process-n").
args: Tuple of arguments for the target function.
kwargs: Dictionary of keyword arguments for the target function.
daemon: If set to true, the process is a daemon [5, 6]. Daemon processes are background processes that run independently of the main program flow [6]. They are abruptly stopped when the program exits and are suitable for non-critical tasks [6]. Non-daemon processes must be explicitly joined using .join() [7]. The main program waits for non-daemon processes to complete before exiting [7].
.start(): Sets the process for scheduling and execution [8].
.join(): Makes the program wait for the process to finish its task [8].
Terminating a Process:
Processes are autonomous and require external control [9].
.is_alive(): Returns True if the process is running, False otherwise [9].
.terminate(): Immediately forces a process to terminate but does not perform cleanup [9, 10]. Use with caution [9, 10].
exitcode Attribute: Returns the exit code after termination [10]:
None: Still running.
0: No error.
> 0: Error occurred, and this is the error code.
< 0: Process killed with a signal of -1 * ExitCode.
join() waits for the process to complete and ensures resources are cleaned up, while terminate() stops the process abruptly without waiting for completion, potentially leading to resource leaks [11, 12].
Defining a Process in a Subclass:
Object-oriented programming can be used to create processes by making a child class that inherits from the Process class [12, 13].
Override the constructor (__init__) to define class variables and the execution method (run), which is called when .start() is used [13].
Exchanging Objects in Processes:
Queues:
multiprocessing.Queue is a process-safe FIFO data structure for inter-process communication [14-16].
put(item) adds an item to the queue, blocking if the queue is full [14, 15].
get() removes and returns an item, blocking if the queue is empty [15].
multiprocessing.Queue is designed for inter-process communication, while queue.Queue is for inter-thread communication within the same process [16].
JoinableQueue is a variant that provides task completion semantics, useful for tracking when all enqueued tasks have been processed [17].
task_done(): Call this method to indicate a task is completed [17].
join(): Use this method for the producer to block until all tasks are done [17].
Pipes:
Pipes provide a two-way communication channel between two processes, consisting of two connection objects [18, 19].
Data is sent with send() and received with recv() [18, 19].
Pipes can be unidirectional (simple) or bidirectional (duplex) [19].
Process Synchronization:
Similar to thread synchronization, using mechanisms like Locks, RLocks, Conditions, Events, Semaphores, and Barriers [20, 21].
RLocks:
Similar to thread RLocks but designed for managing locks between processes [21].
Useful when functions called by a process need to acquire an already held lock [21].
acquire() locks the RLock, incrementing the lock count if already held [21, 22].
release() decreases the lock count, releasing the lock when the count reaches zero [22].
Condition:
A synchronization primitive used to communicate that a particular state or condition in the application has been reached [23].
Allows processes to wait for a signal from another process indicating a certain condition has been met [23].
A process acquires a Condition object for exclusive access to a shared resource and can then release the Condition and enter a wait state until another process signals that the condition has been met [23, 24].
Event:
An Event manages an internal flag that processes can set or clear [25].
event.set() changes it to true, and event.clear() sets it back to false [25].
Processes can wait for the flag to be true using event.wait(), which blocks until the flag is true [25, 26].
Pools:
The multiprocessing.Pool class provides a way to parallelize tasks across multiple processes, abstracting away process management details [27, 28].
Pools can be synchronous or asynchronous [28].
Synchronous pools execute tasks sequentially, completing each before starting the next, offering predictable execution [28].
Asynchronous pools allow concurrent task execution, improving resource utilization [29].
Major functions [30]:
apply(func, args=(), kwargs={}): Takes the name of the function to execute by a worker process [30].
map(func, iterable): Applies a given function to each item in an iterable and returns a list of the results, executing the function calls concurrently across multiple processes [31].
starmap(func, iterable): Applies a given function to each item in an iterable and returns a list of the results, executing the function calls concurrently across multiple processes [32].
Introduction to multiprocessing.Manager():
Manager() creates data that can be shared between different processes, supporting shared lists, dictionaries, etc. [33].
The manager service runs in a separate process and controls a server process which manages shared objects [33]. Processes communicate with the manager server using proxies [33].
Introduction to concurrent.futures:
A high-level module for asynchronous task execution, providing a simple interface for managing threads and processes [34, 35].
Offers ThreadPoolExecutor and ProcessPoolExecutor classes [35].
Uses Futures to represent the result of an asynchronous computation [35].