PA

lecture25 - Distributed File Systems 3 - Design Choices

Overview of Socket Communications

  • Socket: Enables channel-based communication, acting as endpoints to send and receive messages between processes on different machines.

  • Remote Procedure Calls (RPCs): Allow writing functions for remote calls like local calls.

Network Accessible File Systems

Definition and Purpose

  • Aims to access remote files as if they are local files, referred to as Network Attached Storage (NAS), Network File System (NFS), or Distributed File System.

  • Components of a network file service:

    • Remote File Server: The hardware where the files are stored.

    • Network Protocol: Facilitates file operations (read/write).

    • Remote File Client: Acts as a driver using a virtual file system (VFS).

Virtual File System (VFS)

  • An abstraction layer allowing access to different file systems (e.g., ext4 for Linux, FAT for USB drives).

  • Handles various file system methods to maintain compatibility.

Assignment Context

  • Reference to Assignment 2 involving the fields library and the creation of a mount point.

  • Mount point located in the VFS to facilitate file access.

Remote File Systems on Client Side

  • Client-side similar to mount implementation, called Remote File System, extends abstraction.

  • File system methods (e.g., redirect) link client calls to remote files, providing a seamless interface.

Design Approaches to Remote File Systems

File Service Model

  • Upload/Download Model: Entire file is downloaded for access. Clients must have enough space; modifies consistency tracking.

  • Remote Access Model: Only bytes written to files are transmitted. Server handles all file operations, more efficient for small changes.

Semantics of File Sharing

  • Sequential Semantics: Any written byte is immediately accessible to all clients.

  • Session Semantics: Changes only visible to the modifying client.

Stateful vs. Stateless Design

  • Stateful: Server maintains file state (open files, permissions, etc.). Better performance for multiple requests but requires server memory.

  • Stateless: Client maintains all necessary information. No need for server memory but can lose track in crashes.

Caching Strategies

  • Caching can be done at four levels: server's disk, server's memory, client's memory, or client's disk.

  • Issues include consistency between client-side cache and server data.

Caching Approaches

  1. Write Through: Immediate updates to the server on write.

  2. Delayed Write: Caching changes to be sent back periodically.

  3. Write on Close: Changes sent back only upon closing the file.

  4. Centralized Control: Server tracks which client has opened files.

Design Choices Summary

  • The discussion covers various design elements such as transport protocols (UDP vs TCP), stateful vs stateless systems, remote access models, cache strategies, and semantic models.

Case Study: Network File System (NFS)

Design Goals

  • Developed by Sun Microsystems in 1988 to allow any machine as a client/server and support diskless workstations.

  • Goals emphasized access transparency, robustness against failure, and performance.

NFS Design Decisions

  • Transport Protocol: Initially UDP for speed, later evaluations suggested TCP for reliability.

  • Caching & State: Initially stateless; faced inconsistencies due to no reliable tracking from the server side.

  • Normalization of File Sizes: Versions adapted to handle advancements in technology and use cases.

Overview of NFS Versions

  • NFS v1: Stateless, basic features, challenges with file consistency.

  • NFS v2: Introduced client-side logging and improved performance with nonvolatile memory support.

  • NFS v3: Supported larger file sizes, added commit operations for clients to verify successful writes.

  • NFS v4: Transitioned to a fully stateful system and adopted TCP for reliability.

Case Study: Andrew File System (AFS)

Design Goals and Variables

  • Developed at CMU, aiming for scalability and supporting thousands of clients.

  • Designs based on assumptions about file sizes, access patterns, and caching strategies.

Key Features of AFS

  • Client caches entire files and manages consistency upon closing files.

  • Differences from NFS in utilizing TCP, stateful design, and session semantics for consistency and updates.

Conclusion

  • A comparison between NFS and AFS highlights the strengths and weaknesses of design choices in remote file systems.

  • Discussions highlight how centralized caching can prevent issues with file consistency and performance, while stateless designs can create inefficiencies.

  • Final notes on adapting designs based on user requirements, file access patterns, and system resource availability.