PA

lecture24 - Distributed File Systems 2

Security in Distributed Systems

Traditional Security Management

  • Security traditionally managed by operating systems.

  • Common methods include:

    • sudo and root access for file permissions.

Challenges of Security in Distributed Systems

  • In distributed systems, applications take on security responsibilities:

    • Identification and Authentication: Users authenticate through applications instead of relying purely on OS-level security.

    • Access Control: Applications must implement fine-grained control for user actions.

    • Data Protection:

      • Encryption: Protects data in transit and at rest.

      • Tamper Detection: Ensures data integrity and origin verification.

    • Audit Trail: Comprehensive logs of user actions for security and compliance purposes.

Changing Environment for Security

  • Unlike standalone systems, distributed systems cannot be simply turned off.

  • Operations occur over public networks and remotely managed services, raising trust issues:

    • Reliability on third-party services is often low due to outdated security updates.

    • Security becomes challenging without trust in individual components.

Cryptography as a Solution

  • Essential for protecting stored data in the cloud and securing communications.

  • Users struggle with security measures like multi-factor authentication due to inconvenience.

Scalability in Distributed Systems

  • Systems must be designed to handle growth, starting from a few users to potentially millions.

    • Failure management must be built into the design from the beginning.

Programming Differences in Distributed Systems

Algorithms and Environment

  • Distributed programming differs significantly from single-machine programming due to:

    • Varied programming languages across machines.

    • The necessity for cohesive API and protocols for system interaction.

Service Models in Distributed Systems

Centralized Model

  • Single system or a cluster connected directly by wire (not truly distributed).

  • Legacy systems (e.g., traditional time-sharing systems).

  • Limited scalability.

Client-Server Model

  • Clients send requests to a server that holds data.

  • Direct client-to-server communication; clients do not exchange messages directly.

Multi-Tier Architecture

  • Design seeks to manage complexity by splitting functionalities into multiple layers:

    1. Client interface (UI and user interactions).

    2. Middle tier (managing requests and transactions).

    3. Back end (database and core processing).

Microservice Model

  • Autonomous services with clear interfaces for interaction, allowing for independent functionality and scalability.

Peer-to-Peer Model

  • All machines are peers without designated servers, allowing for robust communication.

  • Examples: BitTorrent, Skype.

Hybrid Model

  • Combines aspects of different service models to meet complex system requirements.

Cloud Computing Overview

  • Cloud computing synonymous with distributed systems; mainly a marketing term.

  • Types of services:

    • Software as a Service: e.g., Google Apps, Salesforce.

    • Platform as a Service: e.g., AWS, providing runtimes and databases.

    • Infrastructure as a Service: e.g., virtual machines for workout.

Intercomputer Communication

Design Approaches

  1. Circuit Switching: Dedicated path established for communication (used in traditional phone systems); synchronous.

  2. Packet Switching: Data sends in packets over shared networks without a dedicated path; asynchronous and more adaptable.

OSI Reference Model

  • Commonly used communication protocols, layers include:

    • Network Layer: IP protocol.

    • Transport Layer: TCP/UDP protocols.

    • Presentation Layer: Handles data formats, serialization, etc.

Data Transmission through Layers

  • Data is encapsulated with headers at every layer, which are stripped off at the destination.

IPv4 vs. IPv6

  • IPv4 allows ~4 billion addresses (limited).

  • IPv6 increased to 128-bit addresses (massively expanded to support greater numbers of devices).

TCP vs. UDP

TCP (Transmission Control Protocol)

  • Reliable, connection-oriented, ensures ordered delivery.

  • Suitable for applications needing data integrity, e.g., web browsing, file transfers.

UDP (User Datagram Protocol)

  • Unreliable, connectionless, suitable for time-sensitive applications, e.g., online gaming, video streaming.

Protocols Overview

  • Protocols facilitate communication between machines using different languages and applications.

  • Ensure understanding of the structure and timing of requests.

Software Interaction Models

Socket Programming

  • Common method to access network communication, encumbered with complex details.

Stateful vs Stateless Services

  • Stateful: Server maintains client state, easier for local access but leads to resource waste if disconnections occur.

  • Stateless: Server does not keep state; each request is self-contained and requires more data in requests.

Remote Procedure Calls (RPC)

  • Simplifies remote function execution to mimic local calls. It auto-generates code managing the complexities of communication.

Benefits and Challenges of RPC

  • Benefits: Simplifies programming, reduces complexity, allows focus on core application logic.

  • Challenges: Less control over communication details; overhead in performance and error handling due to abstraction.