Concurrent Processes & Parallel Processing – Comprehensive Study Notes
Analogy: KFC Drive-Through & Information Retrieval
The lecture opens with a real-world analogy to illustrate synchronised multiprocessing.
You (customer) interact with two visible “processors”:
Order clerk (Processor 1) at the first window.
Cashier (Processor 4) at the second window.
Behind the scenes two more processors operate:
Cook (Processor 3) – retrieves raw chicken (data) from secondary storage and cooks it (data retrieval from disk).
Bagger (Processor 2) – assembles the cooked chicken and prepares it for delivery (query processing / data formatting).
Key take-aways from the analogy
Multiple processors cooperate in a strictly defined order, handing work off just as threads or CPUs share tasks in a computer system.
Highlights the need for synchronisation so each stage receives input when ready and hands off results without conflict or delay.
Demonstrates end-to-end latency: even if the cook is slow, the whole pipeline stalls.
What is Parallel Processing (Multiprocessing)
Definition: Two or more processors operate in unison, executing instructions simultaneously.
Minimum requirement: processors.
A dedicated Processor Manager (often the operating system kernel) must coordinate activity and synchronise CPU interactions.
Core goals
Enhance throughput – complete more work per unit time.
Increase computing power – tackle larger or more complex problems.
Primary benefits when well-designed
Increased reliability
If one processor fails, others can detect the failure and take over tasks.
OS reallocates resources so remaining CPUs avoid overload.
Faster processing through true concurrency.
Trade-offs / challenges
Added architectural complexity.
Need to determine:
How CPUs are interconnected (bus, crossbar, network, etc.).
How their interaction is orchestrated (scheduling, communication, cache coherency).
Flexibility ↑ → design difficulty ↑.
CPU Allocation Strategies
Allocate an entire CPU to each job (coarse-grained).
Allocate a CPU to each working set (medium-grained, e.g.cached pages of a process).
Pipeline single instruction execution by subdividing it so parts run in parallel (fine-grained; superscalar, vector, SIMD).
Typical Multiprocessing Configurations
Three canonical hardware/OS architectures:
Master/Slave (Asymmetric)
Loosely Coupled (Distributed)
Symmetric (SMP)
Master / Slave Configuration
Overview
Essentially a single-processor system augmented with additional slave processors.
The master CPU performs all OS duties: job scheduling, I/O management, memory management, interrupt handling.
Slaves execute user programs or specialised back-end tasks.
Best suited to workloads naturally partitioned into front-end vs. back-end phases (e.g., transaction entry + batch processing).
Advantages
Simplicity of design and implementation.
Disadvantages & limitations
Overall reliability only marginally better than a uniprocessor (single point of failure at the master).
Risk of poor resource utilisation; slaves may idle while master is saturated.
Interrupt storm – master must service interrupts from all slaves, causing overhead.
Conceptual diagram
Central master connected to main memory & I/O.
Multiple slaves access resources solely through the master.
Loosely Coupled Configuration (Distributed / Multicomputer)
Composition
Several complete computer systems, each with its own CPU, memory, I/O devices, and (often) its own copy of the OS.
Interconnected via a high-speed network or bus.
Characteristics
Called "loosely coupled" because each processor retains control of its private resources.
Communication achieved through message passing, RPC, or shared-nothing protocols.
Job Flow
Upon job arrival, the job scheduling algorithm assigns it to the processor with the lightest load.
Job remains on that processor until completion → minimises data migration cost.
Scheduler objectives
Balance system load, maximise resource use, respect locality, minimise communication.
Symmetric Configuration (Symmetric Multiprocessing – SMP)
All processors are identical and have equal access to shared main memory and all I/O devices.
Advantages over loosely coupled systems
Higher reliability (no single master bottleneck).
Resource efficiency – memory & disk shared, reducing redundancy.
Load balancing achievable because any CPU can execute any process.
Graceful degradation – if one CPU fails, performance decreases proportionally rather than catastrophically.
Implementation challenges
Requires tight synchronisation to prevent race conditions & deadlocks.
Process scheduling is decentralised – every CPU may run the scheduler.
Single copy of the OS must maintain global data structures (process table, file table) in shared memory.
With universal I/O access, contention for shared resources grows → needs locking & arbitration algorithms.
Process Synchronisation Software
Purpose: ensure that when one process uses a resource, others are excluded until safe to share.
Critical concepts
Critical Region / Critical Section – portion of code that must execute atomically with respect to some shared resource.
Locking – mechanism that blocks other processes from entering the critical region.
Consequences of poor synchronisation
Deadlock – two or more jobs wait indefinitely for resources.
Starvation – a job waits indefinitely even though resources become free intermittently.
Typical Critical Section Scenario
Processor finishes its current job.
Needs to pick another job:
Consult READY list.
Retrieve PCB & job data.
Advance READY pointer.
Multiple CPUs doing this simultaneously → risk of inconsistent list unless protected.
Similar hazards surface with: memory/page tables, I/O tables, DB records, etc.
Locking Mechanisms
1. Test-and-Set (TS)
Hardware-level, indivisible instruction .
Tests a key bit stored at address .
If key == 0 (free), sets it to 1 (busy) and returns “free”.
Else returns “busy” without altering.
Workflow
Before entering critical region, process executes TS.
If return == free → enter section, key now busy.
On exit → reset key to 0.
If return == busy → process loops, repeatedly executing TS ("busy waiting").
Pros
Simple, minimal hardware support, quick (one cycle).
Cons
Starvation – whichever process retries at just the right instant wins arbitrarily.
Busy waiting – CPU cycles wasted spinning while doing no productive work.
2. WAIT and SIGNAL (a.k.a. Blocking Locks)
Augment TS to eliminate busy waiting via OS scheduler.
Two mutually exclusive operations:
WAIT
Executed when TS indicates resource busy.
Changes PCB state from RUNNING to BLOCKED.
Adds PCB to the queue associated with the lock.
Scheduler selects a different READY process → CPU cycles reused productively.
SIGNAL
Executed when a process exits its critical section.
Sets key to free, inspects waiting queue.
Dequeues one PCB, changes its state to READY.
Eliminates wasteful spinning, but requires OS cooperation and context-switch overhead.
3. Semaphores
Definition: a non-negative integer variable acting as a signalling mechanism.
Railroad analogy: semaphore flag indicates whether track (resource) is clear.
Two atomic operations
P (proberen / WAIT) – decrement; if result < 0, block process.
V (verhogen / SIGNAL) – increment; if result ≤ 0, unblock one waiting process.
Flexible: can protect one resource (binary/semaphore 0–1) or count multiple identical resources (counting semaphore).
Real-Life Example of Busy Waiting (Group Discussion Prompt)
Classic illustration: Trying to place a phone call but repeatedly hitting redial while the line is busy. Each redial attempt is akin to a test-and-set loop, consuming your attention (resource) without productive outcome until the line frees.