File Systems Notes
File Systems
Requirements for File Systems
- Three general requirements:
- Must store a very large amount of information unbound by virtual address space.
- Information must survive the termination of the process using it.
- Multiple processes must be able to access the information concurrently.
- Solution: Store information on disk and other external media in units called files.
- Two requirements for a file:
- Files can be read and written by any process needing to do so.
- Information in file must be persistent.
Files and File Naming
- File system: Part of the operating system that deals with file organization.
- Files provide a way to store information on the disk and read it back later.
- When a process creates a file, it gives the file a name.
- When the process terminates, the file continues to exist and can be accessed by other processes using its name.
- Rules for file naming vary from system to system.
- All current operating systems allow strings of one to eight letters as legal file names. Digits and special characters are also permitted.
- Many file systems support names as long as 255 characters.
- Some file systems (e.g., UNIX) distinguish between upper and lower case letters, whereas others (e.g., MS-DOS) do not.
- Many operating systems support two-part file names, separated by a period (e.g., prog.c).
- The part following the period is called the file extension and usually indicates something about the file.
- In UNIX, a file may have two or more extensions.
- Windows is aware of the extensions and assigns meaning to them.
- Users (or processes) can register extensions with the operating system and specify which program “owns” that extension.
File Structure
- Files can be structured in several ways. Three common possibilities are:
- Byte sequence
- Record sequence
- Tree
Byte Sequence
- Operating system sees the file as a set of bytes.
- UNIX and Windows use this schema.
- User can put anything in the file and name it.
Record Sequence
- A file is a sequence of fixed-length records, each with some internal structure.
- Read and write operations are performed on these fixed length records.
- Example: Mainframe O.S.
Tree Structure
- A file consists of a tree of records, not necessarily all the same length.
- Each file contains a key field in a fixed position in the record.
- The tree is sorted based on the key field to allow rapid searching for a particular key.
- New records can be added to the file.
- The operating system decides where to place them, not the user.
- This type of file system is widely used on large mainframe computers in commercial data processing.
File Types
- Many operating systems support several types of files.
- UNIX has regular files, directories, character special files, and block special files.
- Windows has regular files and directories.
Regular Files
- Contain user information.
- Generally either ASCII files or binary files.
- Advantage of ASCII files: they can be displayed and printed as is and can be edited with any text editor.
- Binary files have some internal structure known only to programs that use them.
- O.S. executes this file only if it has the proper format.
Directories
- System files for maintaining the structure of the file system.
Character Special Files
- Related to input/output and used to model serial I/O devices such as terminals, printers, and networks.
Block Special Files
- Used to model disks.
File Access
- Sequential access:
- Process can read bytes or records in a file in order from the beginning.
- Skipping some bytes or reading out of order is not allowed.
- Sequential file access was used in storage medium like magnetic tape rather than disks.
- Random access file:
- Used in disks.
- Suitable for database systems.
- Possible to read the bytes or records of a file out of order, or to access records by key rather than by position.
- Files whose bytes or records can be read in any order are called random access files.
- Two methods are used for specifying where to start reading:
- Every read operation gives the position in the file to start reading at.
- A special operation,
seek, is provided to set the current position. After aseek, the file can be read sequentially from the now-current position.
File Attributes
- Extra items added to a file like date, time, size, etc., other than the name of file and its data.
- The list of attributes varies considerably from system to system.
File Operations
- Files exist to store information and allow it to be retrieved later.
- Different systems provide different operations to allow storage and retrieval.
- Common system calls relating to files:
- Create
- Delete
- Open
- Close
- Read
- Write
- Append
- Seek
- Get attribute
- Set attribute
- Rename
Memory Mapped Files
Two system calls
mapandunmapare used.File name and virtual address are given; this causes the operating system to map the file into the address space at the virtual address.
File mapping works best in a system that supports segmentation.
In such a system, each file can be mapped onto its own segment so that byte in the file is also byte in the segment.
A process can copy the source segment into the destination segment using an ordinary copy loop; no read or write system calls are needed.
Then, it can execute the
unmapsystem call to remove the files from the address space and then exit.Advantages: eliminates the need for I/O, thus making programming easier.
Disadvantages:
- It is hard for the system to know the exact length of the output file, and there is no way of knowing how many bytes in that page were written.
- The system has to take great care to make sure the two processes do not see inconsistent versions of the file.
A file may be larger than a segment or even larger than the entire virtual address space.
Directories
- To keep track of files, file systems normally have directories or folders.
Single-Level Directory Systems
- The simplest form of directory system is having one directory containing all the files.
- Sometimes, it is called the root directory.
- Advantage is simplicity and ability to locate files quickly.
- Disadvantage: Different users may accidentally use the same names for their files.
Two-Level Directory Systems
- To overcome the disadvantage of single directory system, each user here are given a private directory.
- When a user tries to open a file, the system knows which user it is in order to know which directory to search.
- As a consequence, some kind of login procedure is needed.
- Users can only access files in their own directories.
- However, a slight extension is to allow users to access other users’ files by providing some indication of whose file is to be opened.
- Advantage: eliminates name conflict problem in case of multiple users.
- Disadvantage: Not efficient in presence of large number of files.
Hierarchical Directory Systems
- In order to manage many files, hierarchy structure i.e., tree of directories is preferred.
- Each user can have as many directories as are needed.
- Users can create an arbitrary number of subdirectories.
- This schema acts as a powerful structuring tool for users to organize their work.
Path Names
- Two different methods are commonly used for specifying or denoting the file names placed inside a directory file system.
- Absolute path name: each file is given an absolute path name consisting of the path from the root directory to the file (e.g., /usr/ast/mailbox).
- Absolute path names always start at the root directory.
- Relative path name:
- A user can designate one directory as the current working directory, and any reference for a file means it is present within the working directory.
- But if a file has to be accessed regardless of the working directory, then an absolute path name must be specified.
- Most operating systems that support a hierarchical directory system have two special entries in every directory, "." and "..".
- Dot (.) refers to the current directory; dotdot (..) refers to its parent.
Directory Operations
- The allowed system calls for managing directories are:
- Create
- Delete
- Opendir
- Closedir
- Readdir
- Rename
- Link
- Unlink
File System Implementation
File System Layout
- Sector 0 of the disk is called the MBR (Master Boot Record) and is used to boot the computer. The end of the MBR contains the partition table. This table gives the starting and ending addresses of each partition.
- MBR program locates the active partition, reads in its first block, called the boot block, and executes it.
- The program in the boot block loads the operating system contained in that partition.
- The layout of a disk partition varies from file system to file system.
- The file system will often contain the superblock, free space management block, i-nodes, and the root directory.
- The first one is the superblock. It contains all the key parameters about the file system like file system type, the number of blocks in the file system, and other key administrative Information and is read into memory when the computer is booted.
- Free space management block tells about free blocks in the file system.
- i-nodes tell all about the file.
- Root directory contains the top of the file system tree.
- The remainder of the disk typically contains all the other directories and files.
Implementing Files
- Various methods are used in different operating systems to keep track of files put in respective blocks.
- Some of methods are:
- Contiguous allocation
- Linked list allocation.
- Linked list allocation using table in memory.
- i-node
Contiguous Allocation
- Stores each file as a contiguous run of disk blocks.
Linked List Allocation
- Files are placed as linked list of disk blocks. The first word of each block is used as a pointer to next one, rest of block is used for data.
Linked List Allocation Using a Table in Memory
- A table in main memory called FAT (File Allocation Table) is maintained, and it contains a list of pointers to each disk block.
i-node
- Lists the attributes and disk addresses of the files blocks.
Backup Strategies
- Backing up of files can be done on modern tapes.
- Modern tapes hold tens or sometimes even hundreds of gigabytes
- Backups are done to:
- Recover from disaster – accidents like disk crash, fire, flood, or some natural catastrophe
- Recover from stupidity – accidentally removing files.
- Few issues in backing up data’s are:
- Should the entire file system be backed up or only part of it?
- Should the files that were backed up before also must be backed up again.
- Backing up files that have not changed since the last backup leads to the idea of incremental dumps
- Should the data’s be compressed to save some space before backing up is made
- How should the backup on active file system has to be made.
- Will backing up lead to non technical problems like security problems related to data’s.
Strategies used for back up are
- Physical dump
- Logical dump
Physical Dump
- A physical dump starts at block 0 of the disk, writes all the disk blocks onto the output tape in order, and stops when it has copied the last one.
Logical Dump
- A logical dump starts at one or more specified directories and recursively dumps all files and directories found there that have changed since some given base date.