Looks like no one added any tags here yet for you.
Disk scheduling is done by the OS to schedule I/O requests arriving for the disk. It's also known as I/O scheduling, and it's really important because:
Multiple I/O requests may arrive by different processes and only one I/O request can be served at a time by the disk controller. Thus other I/O requests need to wait in the waiting queue and need to be scheduled.
Two or more requests may be far from each other so can result in greater disk arm movement.
Hard drives are one of the slowest parts of the computer system and thus need to be accessed in an efficient manner.
There are many disk scheduling algorithms, some popular examples being:
First Come First Serve (FCFS)
Shortest Seek Time First (SSTF)
SCAN
CSCAN
LOOK
CLOOK
FCFS is the simplest disk scheduling algorithm there is. The concept behind it is very straightforward - service the requests in the order they arrive in the disk. Of course, there are advantages and disadvantages to using FCFS.
Advantages:
Every request gets a fair chance, meaning no starvation
No indefinite postponement Disadvantages:
It doesn't try to optimize seek time at all
May not provide best possible service
SSTF's idea is to service the requests with the shortest seek time first. So, the seek time of every request is calculated in advance in the queue, then these requests are scheduled according to these calculations. As a result, the request nearest the disk arm goes first. This is a significant improvement over FCFS as it decreases average response time and increases throughput.
Advantages:
Average Response Time decreases
Throughput increases Disadvantages:
Potential for Starvation
Overhead to Calculate Seek Time
High Variance of Response Time (it favors the requests closest to the disk arm
SCAN is also known as the "elevator algorithm". Similar to elevators, it goes in one direction, servicing every request in its path, then reverses its direction only when it hits the end of the disk.
Advantages:
High throughput
Low variance of response time
Average response time Disadvantages:
Long wait time for requests in locations just visited by the disk arm
We introduce CLOOK Scheduling for the same reason we introduced CSCAN Scheduling. Basically, CLOOK is to LOOK what CSCAN is to SCAN. In CLOOK, the disk arm in spite of going to the end goes only to the last request to be serviced in front of the head and then from there goes to the other end's last request. Thus, it also prevents the extra delay which occurred due to unnecessary traversal to the end of the disk.
Advantages:
Maximizes locality and resource utilization Disadvantages:
Can seem a little unfair to other requests and if new requests keep coming in, it can cause starvation to the old and existing ones.
Advantages:
Much faster access
More resistant to kinetic shock, intense pressure, water immersion, etc.
More compact and lighter
Lower power
Disadvantages:
More costly per byte
Limited lifetime
Erases are costly
Note: Most systems nowadays use SSDs instead of HDDs, and HDDs are quickly becoming obsolete, so clearly the pros outweigh the cons in this situation
To extend the life of flash memory chips, write and erase operations are spread out over the entire flash memory. The idea here is to do a the following 2 things:
Keep a count in 12-16 byte trailer of each page of how many writes have been made to that page, and choose the free page with the lowest write count
Randomly scattering data over the entire flash memory span
Many of today's flash chips implement wear leveling in hardware.
RAID stands for Redundant Array of Inexpensive Disks. Disks are cheap these days, and attaching an array of disks to a computer brings several advantages:
Faster read/write access by having multiple reads/writes in parallel
Data is striped across different disks, e.g. each bit of a byte is striped onto a different disk
Better fault tolerance/reliability - if one disk fails, a copy of the data could be stored on another disk. RAID has different levels with increasing redundancy. There's RAID0, RAID1, RAID 1 + 0 (RAID10), RAID2, RAID3, RAID4, RAID5, RAID6, etc.
Organizing files within a directory is an essential task to ensure that files are easily accessible, identifiable, and manageable. There are three primary ways to organize files within a directory:
Flat Namespace
Two-Level Namespace
Hierarchical Namespace
A file system is a way the operating system organizes and stores files on a disk. When an application needs to access a file, it uses an API to make a request to the operating system. The operating system's file manager takes over, interacting with the file system driver, which communicates with the hardware to read and write data to and from the disk using a file system format. The file system manager also provides other services such as file locking and security to ensure that the data is stored and retrieved correctly and protected from unauthorized access. Summarized, we take 3 steps:
An app makes a file call via an API
The call is translated into a system call that invokes the OS' file manager
OS then turns these calls into disk reads/writes
There are 3 options for sharing files and directories:
Symbolic Links: A symbolic link is a special type of file that serves as a pointer to another file or directory on the system.
Duplicating Directories: Duplicating directories involves creating a copy of the original directory and placing it in a shared location. This approach is less flexible than symbolic links and can be difficult to maintain consistency across multiple copies of the same directory.
Setting Permissions: Setting permissions involves controlling which users or groups have access to a file or directory. The owner of a file or directory can set read, write, and execute permissions for the owner, group, and others.
On the disk, the entire file system is stored, including 5 main elements:
its entire directory tree structure
each file's header/control block/inode
each file's data
a boot block, typically the first block of a volume that contains info needed to boot an OS from this volume (and empty if there's no OS to boot)
a volume control block that contains volume or partition details
In main memory (RAM), the OS file manager maintains only a subset of open files and recently accessed directories. Memory is used as a cache to improve performance. All the information is available for a fast search of memory, rather than a slow search of disk, e.g. for a file's FCB. The 4 main file system components stored in RAM are:
Recently accessed parts of the directory structure
A system-wide open file table (OFT) that tracks process-independent info of open files such as the file header and an open count of the number of processes that have a file open
A per-process open file table that tracks all files that have been opened by a particular process
A mount table of devices with file systems that have been mounted as volumes
The following procedural steps are followed:
The directory structure is searched for the requested file. This is fast if it's already in memory. Otherwise, directories have to be retrieved from disk and cached for later access.
Once the file name is found, the directory entry contains a pointer to the file control block on disk. We retrieve the FCB from the disk, copy the FCB into the system's open file table which acts as a cache for future file opens, and we increment the open file counter for this file in the system OFT.
Add an entry into the per-process OFT that points to the file's FCB in the system OFT.
Return a file descriptor or handle to the process that called open(). Some operating systems use mandatory locks on open files so that only 1 process at a time can use an open file (this is the case in Windows), while others allow optional locks or advisory locks so that users have control over synchronizing access to files (which is the case in UNIX).
The following steps are followed:
Remove the entry from the per-process OFT
Decrement the open file counter for this file in the system OFT
If the counter's value is 0, then write back to disk any metadata changes to FCB such as its modification date. Note that there may be inconsistencies between FCB on disk and FCB in memory - file system designers need to be aware of this.
File allocation refers to how file data is stored on disk by dividing it into equally sized blocks. We have a few methods of file allocation, namely:
Contiguous File Allocation
Linked File Allocation
File Allocation Table (FAT)
Indexed Allocation
Multilevel Index Allocation
Modified Multilevel Index Allocation used in UNIX and Linux
Another aspect of file system management is managing the free space. The file system needs to keep track of what blocks of the disk are free, and what blocks are not. For that, we can keep a free-space "list". There are a few approaches we can take:
Bit Vector or Bit Map
Linked List
Grouping
Counting
To improve file system performance, several approaches can be taken both in memory and on disk. In memory, caching FCB information and directory entries can speed up access, as well as hashing the directory tree to quickly locate entries. On disk, indexed allocation is generally faster than linked list allocation for file data, and counting, grouped, or linked list approaches can optimize the free block list for faster allocation of large numbers of files. In addition to the caching and allocation approaches, there are a few other potential optimizations for improving file system performance:
The disk controller can have its own cache to store frequently accessed file data and FCBs.
Read ahead: if the OS detects sequential access, it can read not only the requested block but also several subsequent blocks into the main memory cache in anticipation of future reads.
Asynchronous writes: Delay writing of file data removes disk I/O wait time from the critical path of execution. This allows a disk to schedule writes efficiently, grouping nearby writes together. May avoid a disk write if the data has been changed again soon. Note that in certain cases, synchronous writes may be preferred, e.g., when modifying file metadata in the FCB on an open() call.
Cache file data in memory.
Smarter layout on disk: Keep an inode/FCB near file data to reduce disk seeks, and/or file data blocks near each other.
To recover from file system failures, log-based recovery is used. The operating system maintains a log or journal on disk of each operation on the file system, which is consulted after a failure to reconstruct the file system. Each operation on the file system is written as a record to the log on disk before the operation is actually performed on data on disk. This is called write-ahead logging. The log contains a sequence of statements about what was intended in case of a crash. Some file systems only write changes to the metadata of a filesystem to the log, e.g. file headers and directory entries only (NTFS), and not any changes to file data. Linux ext3fs can be parameterized to operate in three modes:
Journal mode: both metadata and file data are logged.
Ordered mode: only metadata is logged, not file data, and it's guaranteed that file contents are written to disk before associated metadata is marked as committed in the journal.
Writeback mode: only metadata is logged, not file data, and there is no guarantee that file data is written before metadata. This is the riskiest and least reliable mode.
The 6 classic properties of security include:
Confidentiality
Authentication
Authorization
Integrity
Non-repudiation (verification that an event actually took place)
Availability
Note: we're primarily focusing on the first 3.