Distributed Systems Course Notes

Course Information

Course Title: Distributed System
Course Code: PCC-ADS-308G
Type: Professional Core Course
Credits: 3
Semester: 6th

Evaluation

  • Class Work: 25 Marks

  • Exam: 75 Marks

  • Total: 100 Marks

  • Exam Duration: 3 Hours

Exam Structure

  • Total Questions: 9

  • Question 1: Compulsory, 8 short-answer questions, 20 marks.

  • Units I-IV: Two 20-mark questions from each unit.

  • Attempt: Question 1 + one question from each unit (total 5 questions).

Course Outcomes

  1. Analyze popular distributed systems (e.g., P2P).

  2. Understand Shared Memory Techniques.

  3. Knowledge of file access.

  4. Knowledge of Synchronization and Deadlock.

UNIT-I: Introduction to Distributed Systems

Introduction

A distributed system comprises independent computers that appear as a single coherent system to its users. The nodes collaborate, communicate via a network, and coordinate activities to achieve a common goal by sharing resources, data, and tasks.

Centralized vs. Distributed Systems

Feature

Centralized System

Distributed System

Data & Resources

Single central place (server)

Dispersed over several servers/locations

Maintenance

Easy

More challenging due to numerous interaction points

Scalability

Limited; potential bottleneck if the server malfunctions

Better scalability; system can function even with component failure

Reliability

Single point of failure

Higher reliability due to redundancy

Security

Easier to secure

More difficult to secure

Distributed Computing Definition

A system where software components are spread across different computers but operate as a single entity. It can involve various configurations like mainframes, computers, workstations, and minicomputers.

Cloud Computing Principles

Sharing of resources (hardware, software, data) with varying levels of openness and concurrency. It facilitates simultaneous data processing through multiple processors and offers improved fault tolerance, enabling quicker recovery from system failures.

Advantages of Distributed Systems

Organizations use distributed computing to manage data generation and meet increasing application performance demands, providing easier scalability.

  • Data Volume Growth: Simpler to add hardware to a distributed system compared to upgrading/replacing a centralized system.

  • System Types:

    • Cohesive system: Each machine is owned by the customer; results are routed from one source.

    • System with end-users: Each node has an end-user; the distributed system helps in sharing resources/communication.

Benefits of a Multi-Computer Model

  • Improved Scalability: Scale-out architecture for easy hardware addition as load increases.

  • Enhanced Performance: Parallelism for the divide-and-conquer approach.

  • Cost-Effectiveness: Minimizes latency, enhances response time and throughput, and uses low-cost commodity hardware for ensure zero data loss.

Architecture of Distributed Systems

Cloud-based software, forms the backbone, accessible through the internet. Components and connectors are arranged for easy communication. Components are replaceable/reusable modules with well-defined interfaces. Connectors are communication links mediating coordination between components.

Software Architecture

Logical organization of software components and their interactions.

  1. Layered Architecture: Modular approach with components separated into units, improving efficiency. (e.g., OSI model).

  2. Object-Based Architecture: Loosely coupled objects interacting through interfaces (connectors). Interactions occur through direct method calls, such as Remote Procedure Calls (RPC) using Java RMI, Web Services, and REST API Calls.

  3. Data-Centered Architecture: Employs a central data repository (active or passive). Producers (businesses) provide data to the common data store, from which consumers (individuals) request data; utilizes persistent storage such as SQL databases.

  4. Event-Based Architecture: Communication is managed entirely through events. Components are loosely coupled, which allows for easy modification and uses publisher-subscriber systems and enterprise service buses.

System Architecture

System-level architecture focuses on the entire system and placement of components across multiple machines. Key architectures are:

  1. Client-Server Architecture: A server handles all work processes, while clients interact with services and other resources on the remote server. Centralized security is typical, with usernames/passwords stored in a secure database, ensuring stability.

  2. Peer-to-Peer (P2P) Architecture: Operates without central control; nodes function as both clients and servers. Nodes can register with a centralized lookup server or broadcast service requests to other nodes. P2P networks comprise structured (predefined data structure), unstructured (random neighbor selection), and hybrid P2P (orderly assignment of unique functions) sections.

Key Components of a Distributed System

  1. Primary System Controller: Manages server requests and installs executive and mailbox services.

  2. Secondary Controller: Regulates server processing requests and translation load.

  3. User-Interface Client: Offers system information (not in clustered environments).

  4. System Datastore: Sole data storage for shared data, typically on a disk vault.

  5. Database: A relational database storing all data, shared among multiple users.

Examples of Distributed Systems

  1. Networks: Ethernets and LANs, peer-to-peer networks, e-mail, and the internet.

  2. Telecommunication Networks: Telephone and cellular networks and VoIP systems.

  3. Real-Time Systems: Used across airline, ride-sharing, logistics, financial trading, MMOGs, and e-commerce industries emphasizing prompt data conveyance.

  4. Parallel Processors: Splits tasks among multiple processors.

  5. Distributed Database Systems: Databases spread across multiple servers/regions.

  6. Distributed Artificial Intelligence: Approaches include complex learning algorithms and decision-making, requiring large computational data sets.

Real-World Examples
  • Video-rendering systems

  • Scientific computing

  • Airline and hotel reservation

  • Cryptocurrency processors

  • P2P file-sharing

  • Multiplayer video games

  • E-learning applications

  • Distributed supply chains like Amazon

Characteristics of Distributed Systems

  • Resource Sharing: Ability to use hardware, software, or data anywhere in the system.

  • Openness: Extensibility and improvements in the system.

  • Concurrency: Naturally present in distributed systems, managed by separate users in remote locations.

  • Scalability: Ability to increase the system's scale to accommodate more users and improve responsiveness.

  • Fault Tolerance: Reliability even with hardware/software failures.

  • Transparency: Hiding complexity from users and applications.

Advantages of Distributed Systems

  • Scalability: Easy growth via adding more nodes.

  • Reliability & Fault Tolerance: Service continuity in case of failure.

  • Performance: Workloads spread across multiple nodes.

  • Resource Sharing: Increased efficiency and reduced costs.

  • Geographical Distribution: Global service delivery.

Disadvantages of Distributed Systems

  • Lack of relevant software.

  • Security vulnerabilities due to easy data access.

  • Network saturation may cause data transfer issues.

  • Complex database management.

  • Potential network overload.

Goals of Distributed Systems

  • Scalability: Efficient scaling as demand increases (horizontal or vertical).

  • Fault Tolerance: Continuous functioning even with component failures (replication, redundancy, checkpointing).

  • Availability: High uptime and accessibility, handled through replication.

  • Consistency: Ensuring agreement on system state across all nodes.

  • Concurrency: Allowing multiple tasks to be processed concurrently.

  • Transparency: Hiding complexity from users (location, access, replication transparency).

  • Security: Protecting against unauthorized access and maintaining data integrity.

  • Efficiency: Efficient resource management, minimal delays.

Hardware and Software Concepts

Hardware Concepts
  1. Nodes: Individual computing machines.

  2. Communication Network: LAN, WAN, Internet.

  3. Storage Systems: NAS, DFS, cloud storage systems.

  4. Processing Units: CPUs, GPUs, accelerators.

  5. Synchronization Hardware: Clock synchronization mechanisms.

  6. Fault Detection and Recovery Hardware: Redundant components and monitoring.

Software Concepts
  1. Distributed Operating System (DOS): Manages resources across machines.

  2. Middleware: Software between OS and application layers for communication, data management, and security.

  3. Communication Protocols: HTTP/HTTPS, RPC, Message Queueing (AMQP), RESTful APIs, gRPC.

  4. Distributed File Systems: HDFS, GFS, Ceph.

  5. Concurrency and Synchronization: Locks, semaphores, distributed algorithms.

  6. Distributed Algorithms: Leader election, consistency protocols.

  7. Data Replication and Consistency: Handling replication, conflict resolution, and consistency models.

  8. Security Mechanisms: Encryption, authentication, authorization, secure communication protocols.

  9. Virtualization and Containerization: Docker, Kubernetes, VMware.

Design Issues in Distributed Systems

  • Scalability, Reliability, Availability, Consistency, Latency, Load Balancing, Security

  • Architectural Design Patterns, Communication Issues, Data Management

Detailed Breakdown of Key Design Issues
  1. Scalability

    • Handling increased load without performance degradation.

    • Ensuring performance across geographically dispersed locations.

    • Strategies: Horizontal scaling, vertical scaling, sharding.

  2. Reliability

    • Fault tolerance using duplicate components for failover.

    • Automated switching to standby systems (Failover Mechanisms).

    • Data Replication: Storing data copies on multiple nodes.

    • Consensus Algorithms: Ensuring consistency among replicated data (e.g., Paxos, Raft).

  3. Availability

    • Minimize downtime via High Availability Architectures.

    • Monitoring and alerting.

    • Load Balancers: Distributing requests across multiple servers.

    • Geographic Redundancy: Deploying servers in different geographic locations.

  4. Consistency

    • Different Data Consistency Models, strong and eventual consistency.

    • Understanding trade-off between consistency, availability, and partition tolerance.

  5. Latency

    • Sources: Network & Processing Delays.

    • Minimization Techniques: Caching, data compression.

  6. Load Balancing

    • Load Distribution Methods: Round Robin, Least Connections.

    • Dynamic vs. Static Approaches.

  7. Security

    • Authentication: Identity Verification and Access control.

    • Encryption: Data encryption and HTTPS, SSL/TLS for secure communications.

  8. Architectural Design Patterns

    • Client-Server: Centralized Servers.

    • Peer-to-Peer: Decentralized Network.

    • Microservices: Breaking complex applications to simpler, independent services.

  9. Communication Issues

    • TCP/IP versus UDP,

    • Message Passing vs. Shared Memory

    • Synchronous vs. Asynchronous Communication

  10. Data Management

    • Horizontal and Vertical Partitioning of Data

    • Master-Slave, Multi-Master Replication

    • Handling Distributed Transactions, like Two-Phase Commit Blockchain.

Communication in Distributed Systems

Communication in distributed systems is based on message passing due to the absence of shared memory. Processes, such as process A communicating with process B, exchange messages, which requires standardization and protocol to prevent errors.

ISO OSI Model

To simplify the complexity of communication, the International Standards Organization (ISO) developed the Open Systems Interconnection Reference Model (ISO OSI), which identifies various communication levels and assigns standard names.

Layers and Protocols

Layered protocols manage communication issues; connection-oriented protocols establish a connection before data exchange, while connectionless protocols transmit messages without prior setup. The OSI model divides communication into seven layers:

  • Physical: Transmits bits.

  • Data Link: Detects and corrects transmission errors.

  • Network: Routes messages.

  • Transport: Provides reliable communication.

  • Session: Manages dialogues and provides synchronization.

  • Presentation: Interprets data.

  • Application: Offers miscellaneous protocols for common activities.

How Nodes Communicate in Distributed Systems

Nodes communicate through message passing, remote procedure calls, shared memory, or sockets—enabling data exchange and coordinated action, thus fostering efficient system collaboration.

Communication Models

Model

Description

Use Cases

Message Passing

Nodes communicate by sending messages over a channel, which can direct (point-to-point) or indirect (via brokers). Messages synchronous or asynchronous.

Loosely coupled systems over networks.

Remote Procedure Call

Allows one program to execute code remotely as if it's local, providing familiar programming interface.

Client-server architectures, but faces latency and reliability issues.

Publish-Subscribe

Decouples message publishers from subscriber allowing asynchronous, event-driven communication suitable for scaling distributed systems

Messaging systems, IoT platforms, event-driven architectures.

Socket Programming

A low-level interface facilitating bidirectional communications between processes running on different hosts over the network.

Networked applications and distributed systems, for building networked apps.

Shared Memory

Multiple processes or threads share an address space, enabling communication by reading/writing to shared memory but requires synchronization

Tightly coupled systems on multicore processors where issues such as data races and consistency are common.

Communication Protocols
  • TCP: Transmission Control Protocol is a reliable, connection-oriented protocol for data integrity and sequencing.

  • UDP: User Datagram Protocol is a lightweight, connectionless protocol for low-latency data delivery.

  • HTTP: Hypertext Transfer Protocol is used for transferring hypertext documents on the web.

  • SMTP: Simple Mail Transfer Protocol sends and receives email messages used for secure email communication.

  • FTP: File Transfer Protocol manages file transfers.

  • RPC: Remote Procedure Call allows programs to execute procedures or functions on remote servers transparently.

Distributed System Architecture in ATMs

Adopting distributed systems in Automated Teller Machines (ATMs) provides reliability, scalability, and efficiency. It helps achieve:

  • Fault Tolerance and Redundancy

  • Improved Scalability

  • Enhanced Availability

  • Decentralized Processing

  • Adaptability to Network Variability

  • Consistency and Data Integrity

  • Enhanced Security Measures

Group Communication in Distributed Systems

Efficient group communication is crucial for coordinating activities among multiple nodes or entities. This involves reliable communication such as exchanging data to ensure all participants are informed.

Communication Types in Distributed Systems
  • Unicast: One sender to a specific recipient

  • Multicast: One sender to multiple receivers

  • Broadcast: One sender to all nodes in the network

Reliable Multicast Protocols for Group Communication
  • FIFO Ordering: Messages delivered in the order they were sent.

  • Causal Ordering: Preserves causal relationships between messages.

  • Total Order and Atomicity: Ensures all members receive messages in the same order.

Scalability for Group Communication

Scalability in group communication manages increasing nodes, messages, and participants efficiently using -
Partitioning and Sharding, Load Balancing, Replication and Caching.

Key Challenges of Group Communication in Distributed Systems

Maintaining system reliability, scalability, concurrency, consistency and fault tolerance.

Remote Procedure Call (RPC) in Distributed Systems

RPC is a protocol used to execute a procedure on a remote server as if it were a local call. RPC abstracts network communication and thus enables interactions between remote nodes. Key concepts include:

  • Simplified Communication via abstraction

  • Enhanced Modularity and Service Reusability

  • Facilitates Distributed Computing, using Inters Process Communication and Resource Sharing

RPC Architecture Overview

The RPC is built with clients on the server, client and server stubs, the process marshalling and unmarshalling data across communication layer and an RPC protocol

Types of Remote Procedure (RPC) in Distributed Systems

  • Synchronous: Client waits for server response.

  • Asynchronous: Client doesn't wait; handles responses later.

  • One-Way RPC: Client sends request without expecting a reply. Used when return Value or acknowledgements from service do not matter at the same stage.

  • Callback RPC: Server calls back the client with the result.

  • Batch RPC: Client sends multiple requests in a batch.

Middleware in Distributed Systems

Middleware is software that facilitates communication, data management and integrates diverse services. it facilitates-

  • Communication, Seamless Interactions.

  • supports Integrations , Service and Application

  • Enhances Scalability through Load Balancing.

  • Ensures Data Consistency for integrity

Major Types of Middleware

  • Communication Middleware: Provides features for processes communications.

  • Database Middleware: provides connectivity layers that manages transaction and connections.

  • Transaction Middleware: Manages distributed transactions. And guarantees Atomicity.

  • Application Middleware: Implements ESB that integrates varied applications and services

Benefits of Middleware

  • Enhanced Communication and Integration

  • Improved Scalability and Flexibility

  • Transaction Management and increased Atomicity

Challenges of Middleware

  • Overheads from Process and Communication with Additional complexity,

  • Difficult Integrated and potential Vulnerability to Security issues

Distributed Operating System (DOS)

It is a type of the OS that uses several center processors to serve for multiple- time real operations and users. It’s made of joined nodes with a LAN or WAN lines. Provides virtual machine Abstraction and share I/O devices with computing resources.

Types of DOS

Client-Server, Peer- To- Peer where duties divided in evenly manner, Middleware for Interoperability among different OS and multi- tiered Systems

features of DOS

Include qualities such as openness, scalability, distribution resource, sharing, flexibility, transparency
heterogeneity, fault tolerance.

Application of DOS

Are varied from network programs, parallel computation, Telecommunication to real- time- process and sensor network

Distributed Operating Systems: Advantages and Disadvantages

Advantages
  • Resource redistribution, data availability, system operations independent,

  • increased data exchanges

Disadvantage
  • limitations with schedulers or data access,

  • hard implemented securities with higher expectancies

UNIT-II: Synchronization in Distributed Systems

Clock Synchronization

Clock synchronization is process of keeping the distributed computer clock same . Important for Data Integraty, sequencing, fault detection, security, asynchronization . UTC standard with physical clocks along with logical clocks used

Types of Clock Synchronisation
  • Physical clock

  • logical clock

  • mutual exclusion coordination across nodes.

Techniques of Clock Synchronization

include Network Time Protocol (NTP) protocol) ,Precision Time Protocol (PTP) and Berkley with Decentralized algorithm for managing drift

Practical uses in Real life involve financial transaction. Cloud computing databases where coordinated operations ensure integrity.

Challenges

Include Dispersion, local- decision, Preventing failure while working with the time discrepancies and uncertainty

Multilateral Exclusive - Mutex
A concurrence restrainer used to stop race limitations which states that a process cannot enter its essential area the same time with another process

Algorithm requirements include non- starvation . freedom Fairness and fault tolerance . Message passing implements it with token non token and Quorum Based ways

Election Algorithms

It are protocols to elect 1 process among a gaggle of processors to act as a Coordinator. Algorithm Bully Algorithm : The Bully computation applies to system while calculation of algorithm on ringed configuration

Atomic Transactions

They is a process to ensure change across a variety of sides with coordination, properties to keep transaction constant like Atomicity to ensure, Consistency, which is run concurrently , Isolation which Serialize access together with maintaining with durability properties. There is coordination in a distributed transaction with manager that commits Rollback needs

Phases of atomic commitment

Consist of getting ready, to inform or cancer actions with one segment dedication. Two segment Comitted with all and any section . the result and making secure integrity is kept and acknowledged from employees to the head

Unit Deadlocks in Distributed Systems

With a set of processes locked due to each process securing an investment even although awaiting for sources. It makes aid sharing harder and comes with aid and conversation deadlock varieties

Detection processes in techniques

Include centralized and distributed graph wait strategies . with the banker or aid of resource allocation images. Its all about taking steps to lessen the Circular wait