Concurrent Processes & Parallel Processing – Comprehensive Study Notes

Analogy: KFC Drive-Through & Information Retrieval

The lecture opens with a real-world analogy to illustrate synchronised multiprocessing.
- You (customer) interact with two visible “processors”:
- Order clerk (Processor 1) at the first window.
- Cashier (Processor 4) at the second window.
- Behind the scenes two more processors operate:
- Cook (Processor 3) – retrieves raw chicken (data) from secondary storage and cooks it (data retrieval from disk).
- Bagger (Processor 2) – assembles the cooked chicken and prepares it for delivery (query processing / data formatting).
Key take-aways from the analogy
- Multiple processors cooperate in a strictly defined order, handing work off just as threads or CPUs share tasks in a computer system.
- Highlights the need for synchronisation so each stage receives input when ready and hands off results without conflict or delay.
- Demonstrates end-to-end latency: even if the cook is slow, the whole pipeline stalls.

What is Parallel Processing (Multiprocessing)

Definition: Two or more processors operate in unison, executing instructions simultaneously.
Minimum requirement: $\ge 2$ processors.
A dedicated Processor Manager (often the operating system kernel) must coordinate activity and synchronise CPU interactions.
Core goals
- Enhance throughput – complete more work per unit time.
- Increase computing power – tackle larger or more complex problems.
Primary benefits when well-designed
- Increased reliability
- If one processor fails, others can detect the failure and take over tasks.
- OS reallocates resources so remaining CPUs avoid overload.
- Faster processing through true concurrency.
Trade-offs / challenges
- Added architectural complexity.
- Need to determine:
- How CPUs are interconnected (bus, crossbar, network, etc.).
- How their interaction is orchestrated (scheduling, communication, cache coherency).
- Flexibility ↑ → design difficulty ↑.

CPU Allocation Strategies

Allocate an entire CPU to each job (coarse-grained).
Allocate a CPU to each working set (medium-grained, e.g.cached pages of a process).
Pipeline single instruction execution by subdividing it so parts run in parallel (fine-grained; superscalar, vector, SIMD).

Typical Multiprocessing Configurations

Three canonical hardware/OS architectures:
1. Master/Slave (Asymmetric)
2. Loosely Coupled (Distributed)
3. Symmetric (SMP)

Master / Slave Configuration

Overview
- Essentially a single-processor system augmented with additional slave processors.
- The master CPU performs all OS duties: job scheduling, I/O management, memory management, interrupt handling.
- Slaves execute user programs or specialised back-end tasks.
Best suited to workloads naturally partitioned into front-end vs. back-end phases (e.g., transaction entry + batch processing).
Advantages
- Simplicity of design and implementation.
Disadvantages & limitations
- Overall reliability only marginally better than a uniprocessor (single point of failure at the master).
- Risk of poor resource utilisation; slaves may idle while master is saturated.
- Interrupt storm – master must service interrupts from all slaves, causing overhead.
Conceptual diagram
- Central master connected to main memory & I/O.
- Multiple slaves access resources solely through the master.

Loosely Coupled Configuration (Distributed / Multicomputer)

Composition
- Several complete computer systems, each with its own CPU, memory, I/O devices, and (often) its own copy of the OS.
- Interconnected via a high-speed network or bus.
Characteristics
- Called "loosely coupled" because each processor retains control of its private resources.
- Communication achieved through message passing, RPC, or shared-nothing protocols.
Job Flow
- Upon job arrival, the job scheduling algorithm assigns it to the processor with the lightest load.
- Job remains on that processor until completion → minimises data migration cost.
Scheduler objectives
- Balance system load, maximise resource use, respect locality, minimise communication.

Symmetric Configuration (Symmetric Multiprocessing – SMP)

All processors are identical and have equal access to shared main memory and all I/O devices.
Advantages over loosely coupled systems
- Higher reliability (no single master bottleneck).
- Resource efficiency – memory & disk shared, reducing redundancy.
- Load balancing achievable because any CPU can execute any process.
- Graceful degradation – if one CPU fails, performance decreases proportionally rather than catastrophically.
Implementation challenges
- Requires tight synchronisation to prevent race conditions & deadlocks.
- Process scheduling is decentralised – every CPU may run the scheduler.
- Single copy of the OS must maintain global data structures (process table, file table) in shared memory.
- With universal I/O access, contention for shared resources grows → needs locking & arbitration algorithms.

Process Synchronisation Software

Purpose: ensure that when one process uses a resource, others are excluded until safe to share.
Critical concepts
- Critical Region / Critical Section – portion of code that must execute atomically with respect to some shared resource.
- Locking – mechanism that blocks other processes from entering the critical region.
Consequences of poor synchronisation
- Deadlock – two or more jobs wait indefinitely for resources.
- Starvation – a job waits indefinitely even though resources become free intermittently.

Typical Critical Section Scenario

Processor finishes its current job.
Needs to pick another job:
- Consult READY list.
- Retrieve PCB & job data.
- Advance READY pointer.
Multiple CPUs doing this simultaneously → risk of inconsistent list unless protected.

Similar hazards surface with: memory/page tables, I/O tables, DB records, etc.

Locking Mechanisms

1. Test-and-Set (TS)

Hardware-level, indivisible instruction $TS(addr)$ .
- Tests a key bit stored at address $addr$ .
- If key == 0 (free), sets it to 1 (busy) and returns “free”.
- Else returns “busy” without altering.
Workflow
- Before entering critical region, process executes TS.
- If return == free → enter section, key now busy.
- On exit → reset key to 0.
- If return == busy → process loops, repeatedly executing TS ("busy waiting").
Pros
- Simple, minimal hardware support, quick (one cycle).
Cons
- Starvation – whichever process retries at just the right instant wins arbitrarily.
- Busy waiting – CPU cycles wasted spinning while doing no productive work.

2. WAIT and SIGNAL (a.k.a. Blocking Locks)

Augment TS to eliminate busy waiting via OS scheduler.
Two mutually exclusive operations:
1. WAIT
- Executed when TS indicates resource busy.
- Changes PCB state from RUNNING to BLOCKED.
- Adds PCB to the queue associated with the lock.
- Scheduler selects a different READY process → CPU cycles reused productively.
1. SIGNAL
- Executed when a process exits its critical section.
- Sets key to free, inspects waiting queue.
- Dequeues one PCB, changes its state to READY.
Eliminates wasteful spinning, but requires OS cooperation and context-switch overhead.

3. Semaphores

Definition: a non-negative integer variable acting as a signalling mechanism.
Railroad analogy: semaphore flag indicates whether track (resource) is clear.
Two atomic operations
- P (proberen / WAIT) – decrement; if result < 0, block process.
- V (verhogen / SIGNAL) – increment; if result ≤ 0, unblock one waiting process.
Flexible: can protect one resource (binary/semaphore 0–1) or count multiple identical resources (counting semaphore).

Real-Life Example of Busy Waiting (Group Discussion Prompt)

Classic illustration: Trying to place a phone call but repeatedly hitting redial while the line is busy. Each redial attempt is akin to a test-and-set loop, consuming your attention (resource) without productive outcome until the line frees.