GM

Lecture 5: Process-Based Parallelism

Lecture 5 introduces process-based parallelism in Python, contrasting it with thread-based parallelism and detailing inter-process communication (IPC) and synchronization methods [1].

Key aspects covered in the lecture:

  • Process-Based Parallelism:

  • Processes are self-contained execution environments with their own memory space, offering greater isolation and independence compared to threads [1, 2].

  • This eliminates the risk of memory corruption inherent in threads, as processes do not share memory [2].

  • Process-based parallelism is suited for distributed systems where tasks run on different machines or cores without shared memory [2].

  • Process programming follows a distributed memory paradigm, meaning variables accessible to one process are not accessible to others [3].

  • Handling Processes:

  • The multiprocessing.Process object allows running multiple functions concurrently [4].

  • Key arguments when instantiating a process [5]:

  • group: The process's group.

  • target: The callable function to be executed.

  • name: The process's name (default is "Process-n").

  • args: Tuple of arguments for the target function.

  • kwargs: Dictionary of keyword arguments for the target function.

  • daemon: If set to true, the process is a daemon [5, 6]. Daemon processes are background processes that run independently of the main program flow [6]. They are abruptly stopped when the program exits and are suitable for non-critical tasks [6]. Non-daemon processes must be explicitly joined using .join() [7]. The main program waits for non-daemon processes to complete before exiting [7].

  • .start(): Sets the process for scheduling and execution [8].

  • .join(): Makes the program wait for the process to finish its task [8].

  • Terminating a Process:

  • Processes are autonomous and require external control [9].

  • .is_alive(): Returns True if the process is running, False otherwise [9].

  • .terminate(): Immediately forces a process to terminate but does not perform cleanup [9, 10]. Use with caution [9, 10].

  • exitcode Attribute: Returns the exit code after termination [10]:

  • None: Still running.

  • 0: No error.

  • > 0: Error occurred, and this is the error code.

  • < 0: Process killed with a signal of -1 * ExitCode.

  • join() waits for the process to complete and ensures resources are cleaned up, while terminate() stops the process abruptly without waiting for completion, potentially leading to resource leaks [11, 12].

  • Defining a Process in a Subclass:

  • Object-oriented programming can be used to create processes by making a child class that inherits from the Process class [12, 13].

  • Override the constructor (__init__) to define class variables and the execution method (run), which is called when .start() is used [13].

  • Exchanging Objects in Processes:

  • Queues:

  • multiprocessing.Queue is a process-safe FIFO data structure for inter-process communication [14-16].

  • put(item) adds an item to the queue, blocking if the queue is full [14, 15].

  • get() removes and returns an item, blocking if the queue is empty [15].

  • multiprocessing.Queue is designed for inter-process communication, while queue.Queue is for inter-thread communication within the same process [16].

  • JoinableQueue is a variant that provides task completion semantics, useful for tracking when all enqueued tasks have been processed [17].

  • task_done(): Call this method to indicate a task is completed [17].

  • join(): Use this method for the producer to block until all tasks are done [17].

  • Pipes:

  • Pipes provide a two-way communication channel between two processes, consisting of two connection objects [18, 19].

  • Data is sent with send() and received with recv() [18, 19].

  • Pipes can be unidirectional (simple) or bidirectional (duplex) [19].

  • Process Synchronization:

  • Similar to thread synchronization, using mechanisms like Locks, RLocks, Conditions, Events, Semaphores, and Barriers [20, 21].

  • RLocks:

  • Similar to thread RLocks but designed for managing locks between processes [21].

  • Useful when functions called by a process need to acquire an already held lock [21].

  • acquire() locks the RLock, incrementing the lock count if already held [21, 22].

  • release() decreases the lock count, releasing the lock when the count reaches zero [22].

  • Condition:

  • A synchronization primitive used to communicate that a particular state or condition in the application has been reached [23].

  • Allows processes to wait for a signal from another process indicating a certain condition has been met [23].

  • A process acquires a Condition object for exclusive access to a shared resource and can then release the Condition and enter a wait state until another process signals that the condition has been met [23, 24].

  • Event:

  • An Event manages an internal flag that processes can set or clear [25].

  • event.set() changes it to true, and event.clear() sets it back to false [25].

  • Processes can wait for the flag to be true using event.wait(), which blocks until the flag is true [25, 26].

  • Pools:

  • The multiprocessing.Pool class provides a way to parallelize tasks across multiple processes, abstracting away process management details [27, 28].

  • Pools can be synchronous or asynchronous [28].

  • Synchronous pools execute tasks sequentially, completing each before starting the next, offering predictable execution [28].

  • Asynchronous pools allow concurrent task execution, improving resource utilization [29].

  • Major functions [30]:

  • apply(func, args=(), kwargs={}): Takes the name of the function to execute by a worker process [30].

  • map(func, iterable): Applies a given function to each item in an iterable and returns a list of the results, executing the function calls concurrently across multiple processes [31].

  • starmap(func, iterable): Applies a given function to each item in an iterable and returns a list of the results, executing the function calls concurrently across multiple processes [32].

  • Introduction to multiprocessing.Manager():

  • Manager() creates data that can be shared between different processes, supporting shared lists, dictionaries, etc. [33].

  • The manager service runs in a separate process and controls a server process which manages shared objects [33]. Processes communicate with the manager server using proxies [33].

  • Introduction to concurrent.futures:

  • A high-level module for asynchronous task execution, providing a simple interface for managing threads and processes [34, 35].

  • Offers ThreadPoolExecutor and ProcessPoolExecutor classes [35].

  • Uses Futures to represent the result of an asynchronous computation [35].