Distributed & Parallel Systems – Foundations, Deadlocks, and Architectural Overview

Chapter 1 – Introduction

• The course will place a strong emphasis on distributed systems, because they are significantly harder to design and reason about than either serial or conventional parallel computers.
• Parallel computers are often described as “many CPUs wired together and sharing memory.”
• In distributed systems, the CPUs do not share a single physical memory; instead they communicate over a network and must cope with latency, partial failures, and the absence of a global clock.
• Key motivation for studying the topic:
• Transparency, concurrency, fault tolerance, data replication, and other desirable properties are harder to attain in distributed environments.
• Classic OS-level concerns (e.g., deadlocks) resurface, but solutions that work for a single-machine or shared-memory parallel computer do not generalize trivially.
• Simple hardware analogy: “adding another 8 GB of RAM” works for a single box, but not for geographically separate machines—the transparency breaks down.

Chapter 2 – Deadlocks in Distributed vs. Parallel Systems

• Basic definition: A deadlock exists when a set of processes are each waiting for events that can be caused only by another process in the same set.
• Canonical parallel example (single shared memory or bus):

  1. 8 processes P<em>1!P</em>8P<em>1!\dots P</em>8 hold resources R<em>1!R</em>8R<em>1!\dots R</em>8 in a circular-wait pattern.

  2. No resource can be reclaimed unless one process relinquishes the resource it already holds.

  3. If no process voluntarily releases its resource, the OS must pre-empt or kill at least one process to recover.
    Distributed variant is more subtle:
    • Example: Two branches of a bank run separate processes; each branch can lock its own database records.
    • Process A (Branch 1) holds lock L<em>1L<em>1 and waits for L</em>2L</em>2; Process B (Branch 2) holds L<em>2L<em>2 and waits for L</em>1L</em>1.
    • Because the locks are at different physical sites, detecting the cycle requires global knowledge that is not instantly available.
    • Implications:
    • Need for distributed deadlock detection, prevention, or avoidance algorithms.
    • Communication delays, partial failures, and lack of a single scheduler make the design space challenging.

Chapter 3 – General Challenges in Distributed Systems

• Every classical OS problem (deadlock, mutual exclusion, scheduling, synchronization) becomes qualitatively harder:
Clock Synchronization: Without a global clock, ordering events becomes non-trivial; algorithms such as Lamport timestamps or vector clocks are used.
Fault Tolerance: Nodes or links may fail independently; algorithms must detect and recover.
Consistency Models: How up-to-date must each node’s data be? Choices range from strong consistency to eventual consistency.
• The syllabus will cover these topics step-by-step.

Chapter 4 – Recap of Previous Three Lectures

• Defined and contrasted serial, parallel, and distributed computing models.
• Taxonomy of computer architectures (Flynn’s):
SISDSISD – Single Instruction, Single Data (traditional serial machine).
SIMDSIMD – Single Instruction, Multiple Data (vector processors, GPUs).
MISDMISD – Multiple Instruction, Single Data (rare).
MIMDMIMD – Multiple Instruction, Multiple Data (general parallel & distributed systems).
• Explored memory–processor coupling:
Tightly coupled: Shared global memory with hardware cache coherence; fast interconnect.
Loosely coupled: Separate memories, message passing; forms the basis of most distributed systems.
• Communication infrastructure in parallel machines:
Switch-based, crossbar, bus, and time-division multiplexing architectures were compared on cost, latency, and scalability.

Chapter 5 – Preview of Upcoming Material

• Next class will formalize the definition of a distributed system and enumerate its characteristics: transparency, scalability, openness, reliability, security.
• Comparative study:
• Advantages of distributed over centralized: fault isolation, geographic distribution, resource sharing, price/performance.
• Disadvantages: higher design complexity, latency, security exposure, difficulty of global state observation.
• Hardware taxonomy for distributed systems will also be introduced (e.g., clusters, grids, edge/fog, peer-to-peer).

Chapter 6 – Course Logistics & Study Advice

• Instructor plans to mix theory with “interesting problems” and practice sessions.
• Strong recommendation: continuous learning—do not cram the entire syllabus at semester’s end.
• Approach the subject “like a story”: understand motivations, examples, and interconnections to retain concepts better.

Appendix – Key Terms & Concepts (Quick Reference)

Deadlock: Circular wait, mutual exclusion, hold-and-wait, no pre-emption.
Distributed Deadlock Detection: Global wait-for graph, edge chasing, probe algorithms.
Clock Synchronization: NTP, Cristian’s algorithm, Berkeley algorithm, Lamport & vector clocks.
Tightly vs. Loosely Coupled: Shared-memory vs. message-passing distinction.
Switch-based interconnect: Crossbar, multistage networks, cost O(N2)O(N^2) vs. O(NlogN)O(N\log N) trade-offs.