Module 8 - File Systems

Review: Mass Storage

Disk drives are the major secondary-storage I/O devices on most computers.
Disk scheduling algorithms can improve the effective bandwidth, the average response time, and the variance in response time.
Disks are frequently made redundant via RAID algorithms for the amount of storage required on large systems.
Tertiary storage is built from disk and tape drives that uses removable media.

Chapter 13: File-System Interface

File Concept
Access Methods
Directory Structure
File System Mounting
File Sharing
Protection

File Concept

Contiguous logical address space abstracts from the physical properties of its storage devices.
Types:
- Data
  - Numeric
  - Character
  - Binary
- Program

Building a File System

File System: Layer of OS that transforms blocks of disks (or other block devices) into Files, Directories, etc.
File System Components
- Disk Management: collecting disk blocks into files
- Naming: Interface to find files by name, not by blocks
- Protection: Layers to keep data secure
- Reliability/Durability: Keeping of files durable despite crashes, media failures, attacks, etc.
User vs. System View of a File
- User’s view:
  - Durable Data Structures
- System’s view (system call interface):
  - Collection of Bytes
- System’s view (inside OS):
  - Collection of blocks (a block is a logical transfer unit, while a sector is the physical transfer unit)

Translating from User to System View

What happens if the user says: give me bytes 2—12?
- Fetch block corresponding to those bytes
- Return just the correct portion of the block
What about: write bytes 2—12?
- Fetch block => Modify portion => Write out Block
Everything inside File System is in whole size blocks
- For example, getc(), putc()  buffers something like 4096 bytes, even if the interface is one byte at a time
From now on, a file is a collection of blocks in the File System.

File Attributes

Name – only information kept in human-readable form.
Type – needed for systems that support different types.
Location – pointer to file location on device.
Size – current file size.
Protection – controls who can do reading, writing, executing.
Time, date, and user identification – data for protection, security, and usage monitoring.
Information about files is kept in the directory structure, which is maintained on the disk.

File Types - Name, Extension

executable
- usual extension: exe, com, bin or none
- function: read to run machine-language program
object
- usual extension: obj, o
- compiled, machine language, not linked
source code
- usual extension: c, cc, java, pas, asm, a
- source code in various languages
batch
- usual extension: bat, sh
- commands to the command interpreter
text
- usual extension: txt, doc
- textual data, documents
word processor
- usual extension: wp, tex, rrf, doc
- various word-processor formats
library
- usual extension: lib, a, so, dll
- libraries of routines for programmers
print or view
- usual extension: mpeg, mov, rm
- ASCII or binary file in a format for printing or viewing
archive
- usual extension: arc, zip, tar
- related files grouped into one file, sometimes compressed, for archiving or storage
multimedia
- usual extension: mpeg, mov, rm
- binary file containing audio or A/V information

Designing the File System: Access Patterns

How do users access files?
Need to know the type of access patterns the user is likely to throw at the system.
Sequential Access: bytes read in order (“give me the next X bytes, then give me next, etc.”)
- Almost all file access are of this flavor.
Random Access (Direct): read/write element out of the middle of the array (“give me bytes i—j”)
- Less frequent, but still important. For example, virtual memory backing file: page of memory stored in file.
- Want this to be fast – don’t want to have to read all bytes to get to the middle of the file.

Direct Access

sequential access
- reset: cp = 0;
- read next: read cp; cp = cp+1;
- write next: write cp; cp = cp+1;
- cp is a variable that defines the current position

Designing the File System: Access Patterns

Content-based Access: (“find me 100 bytes starting with KUBI”)
- Example: employee records – once you find the bytes, increase my salary by a factor of 2
Many systems don’t provide this; instead, databases are built on top of disk access to index content (requires efficient random access)

Directory Structure

A collection of nodes containing information about all files.
Both the directory structure and the files reside on disk.
Backups of these two structures are kept on tapes.

Directory Structure (Cont.)

Disks are split into one or more partitions.
Each partition contains information about the files within it. This information is kept in entries in a device directory or volume table of contents.
- record name, location, size, and type

Organize the Directory (Logically) to Obtain

Efficiency – locating a file quickly.
Naming – convenient to users.
- Two users can have the same name for different files.
- The same file can have several different names.
Grouping – logical grouping of files by properties, (e.g., all Java programs, all games, …)

Single-Level Directory

A single directory for all users.
- Naming problem: unique file name for all users.
- Grouping problem: hard to group for a user, and keeping track of hundreds of files is a big problem.

Two-Level Directory

Separate directory for each user.
- Path name: a user name and a file name
- Can have the same file name for different users
- Efficient searching
- No grouping capability

Tree-Structured Directories

Efficient searching
Grouping Capability
Current directory (working directory)
- cd /spell/mail/prog
- type list
Microsoft Windows family of OS maintains an extended two-level directory structure, with devices and volumes assigned drive letters.

Acyclic-Graph Directories

Have shared subdirectories and files.

Acyclic-Graph Directories (Cont.)

A tree structure prohibits the sharing of files or directories
An acyclic graph allows directories to have shared subdirectories and files.
An acyclic graph, that is, a graph with no cycles, is a natural generalization of the tree-structured directory scheme.
Implementation:
- (UNIX) to create a new directory entry called a link, a pointer to another file or subdirectory.
- Duplicate all information in both sharing directories.

Acyclic-Graph Directories (Cont.)

Two different names (aliasing) may refer to the same file.
If dict deletes list  dangling pointer. Solutions:
- If a system where sharing is implemented by a symbolic link
  - The deletion of a link does not need to affect the original file, only the link is removed
- If a file entry is deleted, the space for the files is deallocated, leaving the links dangling.
- Preserve the file until all references to it are deleted.

General Graph Directory

How do we guarantee no cycles?
- Allow only links to file not subdirectories.
- Garbage collection: determine when the last reference has been deleted and the disk space can be reallocated.
- Every time a new link is added use a cycle detection algorithm to determine whether it is OK.

File System Mounting

A file system must be mounted before it can be accessed.
The operating system is given the name of the device, and the location within the file structure at which to attach the file system (or mount point). Typically, a mount point is an empty directory at which the mounted file system will be attached.
The operating system verifies that the device contains a valid file system.
The operating system notes in its directory structure that a file system is mounted at the specified mount point.

File Sharing

Sharing of files on multi-user systems is desirable.
Sharing may be done through a protection scheme.
- owner and group
On distributed systems, files may be shared across a network.
Network File System (NFS) is a common distributed file-sharing method.

Protection

Safe from physical damage (reliability)
- Solution: duplicate copies of files.
Improper access (protection)
- Solutions: ability to access files
- File owner/creator should be able to control:
  - what can be done
  - by whom
Types of access
- Read
- Write
- Execute
- Append
- Delete
- List

Access Lists and Groups

Mode of access: read, write, execute
Three classes of users
- RWX a) owner access 7  1 1 1
- RWX b) group access 6  1 1 0
- RWX c) public access 1  0 0 1
owner group public
chmod 761 game
Attach a group to a file
chgrp G game

Chapter 14: File System Implementation

File System Structure
File System Implementation
Directory Implementation
Allocation Methods
Free-Space Management
Efficiency and Performance
Recovery
Log-Structured File Systems

File-System Structure

File structure
- Logical storage unit
- Collection of related information
File system resides on secondary storage (disks).
Files can be rewritten in place.
Files can be accessed directly any given block of information on the disk.
File control block – storage structure consisting of information about a file

A Typical File Control Block

file permissions
file dates (create, access, write)
file owner, group, ACL
file size
file data blocks
File control block - storage structure consisting of information about a file.

On-disk File System Structures

A boot control block: contains information needed by the system to book an operating system from that partition.
A partition control block: contains partition details, such as the number of blocks in the partition, the size of the blocks, free-block count and free block pointers, and free FCB count and FCB pointer.
A directory structure is used to organize the files
An FCB contains many of the file’s details

In-Memory File System Structures

An in-memory partition table containing information about each mounted partition.
An in-memory directory structure that holds the directory information of recently accessed directories
The system-wide open-file table contains a copy of the FCB of each open file
The per-process open-file table contains a pointer to the appropriate entry in the system-wide open-file table.

In-Memory File System Structures

Open system call:
- Resolves file name, finds file control block (inode)
- Makes entries in per-process and system-wide tables
- Returns index (called “file handle”) in open-file table
Read/write system calls:
- Use file handle to locate inode
- Perform appropriate reads or writes

Virtual File Systems

Virtual File Systems (VFS) provide an object-oriented way of implementing file systems.
VFS allows the same system call interface (the API) to be used for different types of file systems.
The API is to the VFS interface, rather than any specific type of file system.

Directory Implementation

Linear list of file names with a pointer to the data blocks.
- simple to program
- time-consuming to execute
Hash Table – linear list with hash data structure.
- decreases directory search time
- collisions – situations where two file names hash to the same location
- fixed size and the dependence of the hash function on that size

Allocation Methods

An allocation method refers to how disk blocks are allocated for files:
- Contiguous allocation
- Linked allocation
- Indexed allocation

Contiguous Allocation

Each file occupies a set of contiguous blocks on the disk.
Simple – only starting location (block #) and length (number of blocks) are required.
Both sequential and direct access can be supported by contiguous allocation.
IBM VM/CMS operating system uses contiguous allocation because of good performance.

Contiguous Allocation (Cont.)

Problems:
- Find space for a new file.
- Wasteful of space (dynamic storage-allocation problem).
- Files cannot grow.

Linked Allocation

Each file is a linked list of disk blocks: blocks may be scattered anywhere on the disk.
The directory contains a pointer to the first and last blocks of the file.
Simple – need only starting address
Free-space management system – no waste of space
Problems
- No random access
- Linked allocation is the space required for the pointers. If a pointer requires 4 bytes out of a 512-byte block, then 0.78 percent of the disk is being used for pointers, rather than for information.
- Reliability, what if a pointer were lost or damaged.
pointer block \frac{4}{512} =

Linked Allocation (Cont.)

File-allocation table (FAT) – An important variation on the linked allocation method
FAT is used by MS-DOS and OS/2.
The directory entry contains the block number of the first block of the file, and the table entry indexed by that block number then contains the block number of the next block in the file.

Indexed Allocation

Linked allocation solves the external-fragmentation and size-declaration problems of contiguous allocation.
However, linked allocation (FAT) cannot support efficient direct access
Brings all pointers together into the index block.
Pros: Can easily grow up to space allocated for index and Random access is fast
Cons: Clumsy to grow file bigger than table size, still lots of seeks: blocks may be spread over disk index table

Indexed Allocation (Cont.)

Need index table
Random access
Dynamic access without external fragmentation, but have the overhead of the index block.
Mapping from logical to physical in a file of maximum size of 256K words and a block size of 512 words. We need only 1 block for the index table.

Indexed Allocation – Mapping

Mapping from logical to physical in a file of unbounded length (block size of 512 words).
Question: how large the index block should be?
- as small as possible
- Big to hold enough pointers
Schemes
- Linked scheme – Link blocks of index table (no limit on size), the last word in the index block could be NIL (for a small file) or is a pointer to another index block (for a large file).

Indexed Allocation – Mapping (Cont.)

Multilevel index – A variant of the linked representation is to use a first-level index block to point to a set of second-level index blocks, which in turn point to the file blocks. This approach could be continued to a third or fourth level, depending on the desired maximum file size.
outer-index index table file

Indexed Allocation – Mapping (Cont.)

Combined scheme: Used in Unix 4.1 BSD, is to keep the first, say, 13 pointers of the index block in the file’s inode (header)
The first 10 of these pointers point to direct blocks. If the block size is 4KB, up to 48 KB of data may be accessed directly.
The next 3 pointers point to indirect blocks
The first is the address of a single indirect blocks, the single indirect block is an index block, containing not data, but rather the addresses of blocks that do contain data
A double indirect block pointer
A triple indirect block

Performance

Criteria
- Storage efficiency
- Data-block access times
Some systems support direct-access files by using contiguous allocation and sequential access by linked allocation.
Indexed allocation is more complex. Thus, the performance of indexed allocation depends on the index structure, on the size of the file, and on the position of the block desired.

Free-Space Management

Free space list: record all free disk blocks, those not allocated to some file or directory.
- Easy to get contiguous files
Bit vector: each block is represented by 1 bit.
… 0 1 2 n-1
bit[i] =
\begin{cases}
1 & \text{if block[i] is free} \
0 & \text{if block[i] is occupied}
\end{cases}
Simplicity and efficiency in finding the first free block or n consecutive free blocks on the disk

Free-Space Management (Cont.)

Bit vectors are inefficient unless the entire vector is kept in main memory
Linked list (free list)
- Keeping a pointer to the first free block in a special location on the disk and caching it in memory
- No waste of space
- Not efficient
- Cannot get contiguous space easily

Free-Space Management (Cont.)

Grouping
- Store the addresses of n free blocks in the first free block
- The first n-1 of these blocks are actually free
- The last block contains the addresses of another n free blocks
Counting
- Keep the address of the first free block and the number n of free contiguous blocks that follow the first block.

Efficiency and Performance

Efficiency dependent on:
- disk allocation and directory algorithms
- types of data kept in the file’s directory entry
Performance
- disk cache – separate section of main memory for frequently used blocks
- free-behind and read-ahead – techniques to optimize sequential access
- improve PC performance by dedicating section of memory as virtual disk, or RAM disk.

Various Disk-Caching Locations

The difference between a RAM disk and a disk cache is that the contents of the RAM disk are totally user-controlled, whereas those of the disk cache are under the control of the operating system.

Page Cache

A page cache caches pages rather than file-system-oriented blocks using virtual memory techniques.
Memory-mapped I/O uses a page cache.
Routine I/O through the file system uses the buffer (disk) cache.
This leads to the following figure.

Unified Buffer Cache

A unified buffer cache uses the same page cache to cache both memory-mapped pages and ordinary file system I/O.

Recovery

Consistency checking – compares data in the directory structure with data blocks on disk and tries to fix inconsistencies.
Use system programs to back up data from disk to another storage device (floppy disk, magnetic tape).
Recover lost file or disk by restoring data from backup.

Log Structured File Systems

Log structured (or journaling) file systems record each update to the file system as a transaction.
All transactions are written to a log. A transaction is considered committed once it is written to the log. However, the file system may not yet be updated.
The transactions in the log are asynchronously written to the file system. When the file system is modified, the transaction is removed from the log.
If the file system crashes, all remaining transactions in the log must still be performed.

Summary – ch. 13

File & File System
Access pattern
- Sequential, direct, or content
Directory structure
- One level, two levels, tree, acyclic graph, general graph
File mounting, sharing, protection

Summary – ch. 14

File system structure
- FCB, on disk, in-memory, virtual file structure
Directory allocation
- Contiguous, linked, index
Free space management
- Bit vector, linked list, grouping, counting
Page cache
Recovery