Distributed & Parallel Systems – Foundations, Deadlocks, and Architectural Overview
Chapter 1 – Introduction
• The course will place a strong emphasis on distributed systems, because they are significantly harder to design and reason about than either serial or conventional parallel computers.
• Parallel computers are often described as “many CPUs wired together and sharing memory.”
• In distributed systems, the CPUs do not share a single physical memory; instead they communicate over a network and must cope with latency, partial failures, and the absence of a global clock.
• Key motivation for studying the topic:
• Transparency, concurrency, fault tolerance, data replication, and other desirable properties are harder to attain in distributed environments.
• Classic OS-level concerns (e.g., deadlocks) resurface, but solutions that work for a single-machine or shared-memory parallel computer do not generalize trivially.
• Simple hardware analogy: “adding another 8 GB of RAM” works for a single box, but not for geographically separate machines—the transparency breaks down.
Chapter 2 – Deadlocks in Distributed vs. Parallel Systems
• Basic definition: A deadlock exists when a set of processes are each waiting for events that can be caused only by another process in the same set.
• Canonical parallel example (single shared memory or bus):
8 processes hold resources in a circular-wait pattern.
No resource can be reclaimed unless one process relinquishes the resource it already holds.
If no process voluntarily releases its resource, the OS must pre-empt or kill at least one process to recover.
• Distributed variant is more subtle:
• Example: Two branches of a bank run separate processes; each branch can lock its own database records.
• Process A (Branch 1) holds lock and waits for ; Process B (Branch 2) holds and waits for .
• Because the locks are at different physical sites, detecting the cycle requires global knowledge that is not instantly available.
• Implications:
• Need for distributed deadlock detection, prevention, or avoidance algorithms.
• Communication delays, partial failures, and lack of a single scheduler make the design space challenging.
Chapter 3 – General Challenges in Distributed Systems
• Every classical OS problem (deadlock, mutual exclusion, scheduling, synchronization) becomes qualitatively harder:
• Clock Synchronization: Without a global clock, ordering events becomes non-trivial; algorithms such as Lamport timestamps or vector clocks are used.
• Fault Tolerance: Nodes or links may fail independently; algorithms must detect and recover.
• Consistency Models: How up-to-date must each node’s data be? Choices range from strong consistency to eventual consistency.
• The syllabus will cover these topics step-by-step.
Chapter 4 – Recap of Previous Three Lectures
• Defined and contrasted serial, parallel, and distributed computing models.
• Taxonomy of computer architectures (Flynn’s):
• – Single Instruction, Single Data (traditional serial machine).
• – Single Instruction, Multiple Data (vector processors, GPUs).
• – Multiple Instruction, Single Data (rare).
• – Multiple Instruction, Multiple Data (general parallel & distributed systems).
• Explored memory–processor coupling:
• Tightly coupled: Shared global memory with hardware cache coherence; fast interconnect.
• Loosely coupled: Separate memories, message passing; forms the basis of most distributed systems.
• Communication infrastructure in parallel machines:
• Switch-based, crossbar, bus, and time-division multiplexing architectures were compared on cost, latency, and scalability.
Chapter 5 – Preview of Upcoming Material
• Next class will formalize the definition of a distributed system and enumerate its characteristics: transparency, scalability, openness, reliability, security.
• Comparative study:
• Advantages of distributed over centralized: fault isolation, geographic distribution, resource sharing, price/performance.
• Disadvantages: higher design complexity, latency, security exposure, difficulty of global state observation.
• Hardware taxonomy for distributed systems will also be introduced (e.g., clusters, grids, edge/fog, peer-to-peer).
Chapter 6 – Course Logistics & Study Advice
• Instructor plans to mix theory with “interesting problems” and practice sessions.
• Strong recommendation: continuous learning—do not cram the entire syllabus at semester’s end.
• Approach the subject “like a story”: understand motivations, examples, and interconnections to retain concepts better.
Appendix – Key Terms & Concepts (Quick Reference)
• Deadlock: Circular wait, mutual exclusion, hold-and-wait, no pre-emption.
• Distributed Deadlock Detection: Global wait-for graph, edge chasing, probe algorithms.
• Clock Synchronization: NTP, Cristian’s algorithm, Berkeley algorithm, Lamport & vector clocks.
• Tightly vs. Loosely Coupled: Shared-memory vs. message-passing distinction.
• Switch-based interconnect: Crossbar, multistage networks, cost vs. trade-offs.