File Systems Notes
File Systems
File Concept
The operating system offers a uniform logical view of stored information as files.
A file is a logical storage unit.
File structure depends on:
Data:
Numeric
Character
Binary
Program
Types of Files
Text File:
Sequence of characters organized into lines or pages.
Source File:
Subroutines and functions consisting of declarations and executable statements.
Object File:
Sequence of bytes forming blocks used by the linker.
Executable File:
Series of code brought into memory by the loader for execution.
File Types - Name, Extension
File Type | Usual Extension | Description |
|---|---|---|
Executable | exe, com, bin | Ready-to-run machine-language program |
Object | obj, o | Compiled, machine language, not linked |
Source Code | c, cc, java, pas, asm, a | Source code in various languages |
Batch | bat, sh | Commands to the command interpreter |
Text | txt, doc | Textual data, documents |
Word Processor | wp, tex, rtf, doc | Various word-processor formats |
Library | lib, a, so, dll | Libraries of routines for programmers |
Print or View | ps, pdf, jpg | ASCII or binary file in a format for printing or viewing |
Archive | arc, zip, tar | Related files grouped into one file, sometimes compressed, for archiving or storage |
Multimedia | mpeg, mov, rm, mp3, avi | Binary file containing audio or A/V information |
File Attributes
Name: Human-readable information.
Identifier: Unique tag (number) identifying the file within the file system.
Type: Needed for systems that support different types.
Location: Pointer to the file location on the device.
Size: Current file size.
Protection: Controls who can read, write, execute.
Time, date, and user identification: Data for protection, security, and usage monitoring.
File Operations
Create:
Allocate free space on the disk.
Make an entry for the file in the directory.
Write:
Specify the name of the file and information to be written.
Update the write pointer.
Read:
Specify the name and location of the file.
Read pointer stores the current file position.
Repositioning (File Seek):
Search the directory for the required entry.
Set the current file position pointer to a given value.
Deleting a File:
Search for the named file in the directory.
Release its free space and its entry in the directory.
Truncating a File:
Erase only the contents of a file, not the attributes.
Other Operations:
Append
Copy
Rename
File Access Methods
Files store information accessed by user programs.
Sequential Access:
Simplest method.
Information processed in sequential order, one record after the other.
Read Operation: Reads the next record and advances the file pointer.
Write Operation: Adds records to the end of the file and changes the file pointer.
Stored on magnetic tapes or disks.
Direct Access:
Based on the disk model, allowing random access to any file block.
Also known as relative access.
Viewed as a block of records.
Any block can be read or written directly.
Read Operation: Read n means read the information in block number n. Block Numbers start from 0.
Indexed Sequential Access:
Involves constructing an index for the file.
Index is organized in sequence based on the key field and contains pointers to the various blocks.
For large files, the index files themselves can become large.
Two-level indexing can be used: a primary index file points to secondary index files, which point to the actual data.
Directory Structure
A collection of nodes containing information about all files. ( F1, F2, F3, F4, …, Fn )
Types of Directory Structure
Single-Level Directory Structure:
Simplest directory structure.
All files are in the same directory.
Two-Level Directory Structure:
MFD (Master File Directory) - Root Directory
Separate directory for each user - UFD (User File Directory)
Files are stored in UFD.
Files are defined as leaf nodes.
Path name: User file directory name along with the file name (e.g., D:\user2\sample.doc).
Tree-Structured Directory:
Allows users to create their own subdirectories and organize files within them.
A directory can consist of files, subdirectories, or both.
A path name is the path from the root through the subdirectories to the required file.
Absolute Path: Starts at the root (e.g., C:/BCA/BCA2/OS/B2).
Relative Path: Describes the path starting from the current directory (e.g., OS/B2).
File Protection
Files must be protected from:
Physical damage
Improper access
Physical damage can be caused by:
Power failures
Head crashes
Dirty media
Extreme temperatures
Bugs in the system
Protection mechanisms:
Keeping backups of all files.
Single-user systems: Physically removing the floppy disk or CD.
Multi-user systems: Providing access permissions to files.
Access Control List (ACL)
Associated with each file and directory.
Specifies user names and types of access allowed for each user.
Access Groups
Owner: User who created the file.
Group: Set of users who share the file.
Universal: All other users in the system.
In UNIX, the OS defines 3 files of 3 bits each:
rwx (r - read, w - write, x - execute)
File Allocation Methods
How the OS stores information in memory blocks for effective hard drive utilization and file access.
Types of File Allocation Methods:
Contiguous File Allocation:
Allocated blocks are adjacent.
If a file needs blocks and starts at position , the next blocks are .
Advantages:
Easy to implement.
Minimum seek time.
Faster memory access.
Supports sequential and direct access.
Disadvantages:
File size must be initialized at creation.
Size cannot increase.
Disk fragmentation (internal or external) is possible.
Linked File Allocation:
Files are stored in a scattered manner based on available space.
Pointers are used to point to the next block of the same file.
Each block stores a pointer to the next block, requiring extra space.
Directory stores the address of the first memory block and the last memory block.
Advantages:
No external fragmentation.
More flexible than contiguous allocation.
Disadvantages:
Does not support random or direct access.
Disk blocks are affected if pointers are compromised.
Requires extra space for pointers.
Indexed File Allocation:
Uses pointers, but all pointers are put together into one location called the index block.
Index block contains all the addresses of the file's blocks.
Easier retrieval compared to linked allocation.
Advantages:
Reduces the possibilities of external fragmentation.
Direct access to the block.
Disadvantages:
More pointer overhead.
If the index block is lost, the complete file cannot be accessed.
Can be heavy for small files.
A single index block may not keep all pointers for very large files.
Free-Space Management
System keeps track of free disk blocks for allocating space to new files and reusing space from deleted files.
Maintains a free space list.
Various Implementations of Free Space List
Bitmap or Bit Vector:
Each bit corresponds to a disk block.
indicates the block is allocated, and indicates a free block.
Linked List:
Free disk blocks are linked together with pointers.
The address of the first free block is stored separately and cached in memory.
Drawback: I/O required for free space list traversal.
Grouping:
Store addresses of free blocks in the first free block.
The first blocks are actually free.
The last block contains the address of another free block.
Counting:
Based on the fact that several contiguous blocks may be allocated and freed simultaneously.
Holds the address of the first free block and the number of free contiguous blocks that follow.
Each entry in the free space list consists of a disk address and a count.
Disk Structure
Hard Disk:
Secondary storage device.
Electro-mechanical device that stores and retrieves digital data using magnetic storage.
Consists of rigid, rapidly rotating platters coated with magnetic material.
Hard Disk Pack: Consists of more than one disk.
How Data is Stored in a Hard Disk:
Data is stored in tracks and sectors.
Disk surface is divided into tracks.
Tracks are further divided into sectors.
Sector is the addressable unit on the disk.
Disk Structure Components:
Track
Sector
Spindle
Cylinder
Read/write head
Platter
Arm
Arm assembly
Cylinder: Same numbered tracks on all disk surfaces.
Sector: Smallest unit of information that can be read from/written into disk (ranges from 32 bytes to 4096 bytes).
Designed for large amounts of storage (primary design consideration: cost, size, and speed).
Hardware for Disk System:
Disk drive: Device motor, read/write head, associated logic.
Disk controller:
Determines the logical interaction with the computer.
Can service more than one drive (overlapped seeks).
Performance Metrics:
Seek Time:
Time required by the read/write head to move to the requested track.
Includes startup time and track traversal time after the access arm reaches full speed.
Latency (Rotational Delay):
Time required for the requested sector to come under the read/write head.
Rotational delay is generally half the time taken for one rotation.
Data Transfer Rate:
Amount of data transferred per unit time (e.g., 30 MB/Sec).
Data Transfer Time:
Total time taken to transfer a specific amount of data from the disk.
Depends on the data transfer rate of the disk.
Average Access Time:
Types of Disk Scheduling Algorithms
First Come First Serve (FCFS)
Shortest Seek Time First (SSTF)
Elevator (SCAN)
Circular Scan (C-Scan)
Look
C-Look
1. First Come First Serve (FCFS)
Handles I/O requests sequentially.
Fair to all processes.
Approaches random scheduling in performance if there are many processes/requests.
Suffers from global zigzag effect.
Example Seek Time Calculation: For a ready queue with addresses arranged in ascending order, given 53 is the head start, the seek time is calculated as the sum of the absolute differences between consecutive head positions. For example, if the queue is (98, 183, 37, 122, 14, 124, 65, 67), seek time =
2. Shortest Seek Time First (SSTF)
Selects the request with the minimum seek time from the current head position.
Also called Shortest Seek Distance First (SSDF).
Biased in favor of the middle cylinders.
SSTF scheduling is a form of SJF scheduling; may cause starvation of some requests.
3. SCAN
The disk arm starts at one end of the disk and moves toward the other end, servicing requests until it gets to the other end, where the head movement is reversed and servicing continues.
Moves in both directions until both ends.
Tends to stay more at the ends, so more fair to the extreme cylinder requests.
4. Look
The disk arm starts at the first I/O request on the disk and moves toward the last I/O request on the other end, servicing requests until it gets to the other extreme I/O request on the disk, where the head movement is reversed and servicing continues.
Moves in both directions until both last I/O requests; more inclined to serve the middle cylinder requests.
Total Seek Time: The sum of the distances the disk head moves to satisfy the requests in the sequence determined by the algorithm.
5. C-SCAN
The head moves from one end of the disk to the other, servicing requests as it goes. When it reaches the other end, however, it immediately returns to the beginning of the disk without servicing any requests on the return trip.
Treats the cylinders as a circular list that wraps around from the last cylinder to the first one.
Provides a more uniform wait time than SCAN.
6. C-Look
Look version of C-Scan.
Arm only goes as far as the last request in each direction, then reverses direction immediately without going all the way to the end of the disk.
Circular versions are more fair but pay with a larger total seek time.
Scan versions have a larger total seek time than the corresponding Look versions.
Comparison of Scheduling Algorithms
The provided information gives a table comparing FIFO, SSTF, LOOK and C-LOOK based on the average seek length based on some given data and starting point.
FIFO results in average seek length of 55.3 tracks
SSTF results in average seek length of 27.5 tracks
LOOK results in average seek length of 27.8 tracks
C-LOOK results in average seek length of 35.8 tracks
Disk Management by the Operating System
Disk Format
Booting from disk
Bad block recovery
Disk Formatting:
Low-level format (Physical format):
Divides the disk into sectors before storing data so that the disk controller can read and write.
Each sector can contain a header with information, data, and error correction code (ECC).
Sectors typically contain 512 bytes of data, but optional disks use the operating system's own data structures to preserve files.
**Logical format (