non-deadlock bugs
atomicity violation and order violation
atomicity violation
The desired serializability among multiple memory accesses is violated (i.e. a code region is intended to be atomic, but the atomicity is not enforced during execution).
order violation
The desired order between two (groups of) memory accesses is flipped (i.e., A should always be executed before B, but the order is not enforced during execution)
solution for atomicity violation
add locks around shared variable references
solution for order violation
enforce ordering using condition variables
deadlock
-the presence of a cycle -Thread 1 is holding a lock L1 and waiting for another one, L2 -Thread 2 holds L2 and is waiting for L1 to be released
why do deadlocks occur
In large code bases, complex dependencies arise between components Encapsulation - hidden details of implementation that make software easier to build in a modular way (does not mesh well with locking)
conditions for a deadlock
Mutual exclusion - threads claim exclusive control of resources that they require Hold and Wait - threads hold resources allocated to them No preemption - resources cannot be forcibly removed from threads that are holding them Circular Wait - circular chain of threads such that each thread holds one or more resources that are being requested by the next thread in the chain
true or false: if only one of the four deadlock conditions is met a deadlock can occur
false
how to prevent circular wait
total ordering - strict ordering of threads partial ordering (difficult to achieve) partial ordering - useful way to structure lock acquisition to avoid deadlock
how to prevent hold-and-wait
by acquiring all locks at once, atomically
what are the disadvantages of hold-and-wait prevention
encapsulation works against you - requires us to know exactly which locks must be held and to acquire them ahead of time likely to decrease concurrency - all locks must be acquired early on (at once) instead of when they are truly needed.
how to prevent no preemption
use thread libraries that provide a more flexible set of interfaces to help avoid problems of multiple lock acquisition
livelock
not a deadlock, but no progress is being made for the processes
solution for livelock
add random delay before looping back and trying the entire thing over again, thus decreasing the odds of repeated interference among competing threads. Not a great solution, encapsulation problems still arise
problems with no preemption
livelock
true or false: using powerful hardware instructions, you can build data structures in a manner that does not recquire explicit locking.
true
wait-free
can build data structures in a manner that does not require explicit locking
how to prevent mutual exclusion
use lock-free and wait-free data structure designs
true or false: livelock can occur with mutual exclusion
true
how can you avoid deadlocks
smart scheduling
what does deadlock avoidance require
Some global knowledge of which locks various threads might grab during their execution, and subsequently schedules said threads in a way as to guarantee no deadlock can occur
what are downsides of avoiding deadlocks with scheduling
length of jobs is increased, can limit concurrency
detect and recover
allow deadlock to occur than take some action, i.e. computer freezes, you reboot it
true or false: a deadlock detector is always running
false, runs periodically
true or false: in event of a deadlock the system should be restarted
true
monitor
-high level synchronization primitive -guarantees functions are mutually exclusive -provides condition variables such that a process can step outside the monitor while waiting for a condition and prevent other processes from entering the monitor
true or false: adding locks to a data structure makes it thread safe
true
what is the problem with synchronized counter
scales poorly
perfect scaling
more work done in parallel time taken to complete task is not increased
how does a sloppy counter (approximate counter) work
-a single local counter per CPU core -a single global counter -a lock for each local counter and one for the global counter ex: on a machine with four CPUs you have four local counters and one global counter
Hand-over-hand locking (lock coupling)
add a lock per node of the list instead of having a single lock for the entire list when traversing he list, the code first grabs the next node's lock and then release the current node's lock
downsides of lock coupling
increased time to acquire the lock
Michael and Scott Concurrent Queues
-there are two locks one for the head -one for the tail -enable concurrency of enqueue and dequeue
dummy node
allocated in the queue initialization code enable the separation of head and tail operations
Concurrent hash table
-hash table does not resize -built using the concurrent lists -uses a lock per hash bucket
true or false: simple concurrent hash table scales poorly
false, magnificently
persistent storage
keep data intact even if there is a power loss -hard disk drive -solid-state storage drive
premature optimization
performance problems should only be remedied once they exist -there is not value in making something faster if doing so will not improve overall performance of the application
file
a linear array of bytes a unit of storage
inode number
low level name of a file -corresponds to the location of the file's contents
directory
-like a file, has a low-level name -contains a list of (user-readable name, low-level name) pairs
metadata
data that describes other data information about a file
directory hierarchy (tree-structured)
how all files and directories are stored root directory (/), uses separator to name subsequent sub-directories until desired file or directory is named
relative pathname
concatenation of file names starting with current directory
directed acyclic directory hierarchy
organizes directories such that any directory at a given level may point to zero or more files or other directories at lower levels but also permits any file or directory to have more than one parent directory
general graph structure
using symbolic links, only allows a single parent directory for any file or directory but provides symbolic links to support file sharing
absolute pathname
The full pathname to a certain file or directory, starting from the root directory.
why are cycles bad in directory hierarchies
-a cycle can lead to an infinite loop in algorithms that search the hierarchy for a given file -file deletion becomes more difficult -a reference count is not sufficient to prevent the creation of unreachable subgraphs in the directory hierarchy, which can only be removed using a garbage collection algorithm
what is an extent
-a disk pointer plus a length (in blocks) -a file consists of one or more extents -less flexible but more compact, work well when there is enough free space on the disk and files can be laid out contiguously
reference count (link count)
allows the file system to track how many different file names have been linked to this particular inode
symbolic link (soft link)
a file itself of a different type (not like a hard link, can't edit the reference to edit the original) the original file can be accessed through the file name as well as the symbolic link name
hard link
Enables you to create one file and then establish links to that file in other folders, as though the file is in all of the folders. -somewhat limited: you can't create one to a directory, you can't hard link to files in other disk partitions
true or false: a hard link creates a copy of the file
false, just refers to the inode of the original file
why unlink instead of deleting a file
when you create a hard link, it links a human readable name to the file and puts that link into a directory, if you just remove the file, you would still have access to it, so you have to unlink it
the reference count helps with this
dangling reference
problem with symbolic links -removing the original file name causes the link to point to a pathname that no longer exists
permission bits
nine characters -what the owner of the file can do to it -what someone in a group can do to the file -what anyone can do
superuser
trusted user who can access all files regardless of privileges -root -inherent security risk
inode
index node -holds file metadata -length, permissions, location of blocks, type, size, time
open()
-system call returns file descriptor (an integer) -system wide open-file table contains copy of FCB of each file -per-process open-file table contains pointers to appropriate entries in system-wide open-file table
true or false: an open file has a current offset
true, used to determine where the next read or write will begin within the file
how to implicitly update the current offset
add n bytes of read or write to the current offset
how to explicitly update the current offset
use lseek() -changes how the offset is set -has 3 arguments: -fildes - file descriptor -offset - positions file offset to a particular location within the file -whence - determines how the seek is performed
what does fsync() do
-forces all dirty (not yet written) data to the disk -returns once all writes are complete -allows one to avoid the problem of losing data if the machine crashes
true or false: renaming a file is implemented atomically
true
contiguous allocation
every file is mapped to a contiguous sequence of disk blocks -the FCB points to the first disk block -the file length determines how many blocks are occupied by the file
advantages and disadvantages of contiguous allocation
-fast sequential access because neighboring blocks do not require seek operations -fast direct access because a target block number can be computed using file position and block length -disk fragmentation - over time the disk consists of variable length sequences of free and occupied blocks and needs search algorithms to find free areas of appropriate size -inability to expand a file if the block following the last file block is not free, the entire file must be copied to a larger area -difficulty in deciding how much space to allocate to a new file to allow for potential expansion
clustered allocation
links together sequences of contiguous blocks -the last block of any cluster points to the beginning of the logically next cluster -the last block records the number of blocks belonging to the next cluster to facilitate direct access within each cluster -the number of blocks in the first cluster is recorded in the FCB along with the pointer
advantages and disadvantages of clustered allocation
-can easily expand a file - if the block following the last file block is free, then the last cluster is extended like with contiguous allocation -if the block following the last file block is occupied, then a new cluster is started and the last block points to the first block of the new cluster -improved sequential access over the purely linked allocation since accessing with cluster does not require seek -slower sequential access than contiguous allocation since each cluster requires a disk seek -inability to perform direct access since the location of the target block cannot be computed
Linked allocation
each file is a linked list of blocks -file ends at nil pointer -no external fragmentation -free space management system called when new block needed
advantages of linked allocation
-simple -no has external fragmentation -improve efficiency by clustering blocks into groups
problems with linked allocation
-reliability -locating a block can take many I/Os and disk seeks -internal fragmentation
FAT (File Allocation Table)
-an array where each entry corresponds to a disk block -keeps track of which disk blocks belong to a file by linking the blocks in a chain of indices
advantages of FAT
-has the advantages of linked allocation, including file expansion ability -blocks can be cached in main memory -sequential access is more efficient since no seek operations are necessary to follow pointers -direct access is possible because the location of the desired block can be found efficiently by scanning the chain of indices
indexed allocation
Each file has its own index block(s) of pointers to its data blocks random access
advantages and disadvantages of indexed allocation
-Ability to efficiently expand a file by simply adding a new block number to the index table -no external fragmentation -a small table limits the maximum size of any file and a large table wastes disk space
what is the performance of contiguous allocation
good for sequential access and random access
what is the performance of linked allocation
good for sequential access, not random access
what is the performance of indexed allocation
single block - not good, could require index block reads then data block reads clustering - yes, can improve throughput and reduce CPU overhead
how do we store inodes in the file system
inode table
true or false: inodes are unique
true
bitmap
each bit indicates whether a block is free or allocated free = 0 in use = 1
superblock
contains information about the layout of blocks for particular file system -the number of inodes, begin location of inode table
true or false: when mounting a file system, OS reads superblock first
true, to initialize various information
multi-level index
A tree data structure to keep track of the disk location of each data block in a file. indirect pointer points to a block that contains more pointers -inode has fixed number of direct pointers (12) and a single indirect pointer
true or false: most files are small
true, 2K is common size
true or false: the average file size is shrinking
false, growing, 200K is average
true or false: most bytes are stored in large files
true, a few big files use most of the space
true or false: file systems don't contain many files
false, almost 100K on average
true or false: file systems are roughly half full
true, even as disks grow, file systems remain 50% full
true or false: directories are typically large
false, many have few entries, most have 20 or fewer
read()
read from a file -read in first block of the file, consulting the inode to find the block -update inode with newest access time -update in-memory open file table for file descriptor and offset
what happens when a file closes
-If the current content of the r/w buffer is modified, then save the buffer in the corresponding block on the disk. -Update the file length in the FCB. -Free the OFT entry. -Return the status of the close operation to the calling process. the file descriptor is deallocated
write()
update file with new contents -generates 5 I/Os -read data bitmap -write bitmap (to reflect its new state to disk) -two to read and write the inode -write the block itself
true or false: creating a file allocates space for the directory, which causes low I/O traffic
false, high I/O traffic
true or false: cache reduces write I/Os
false, because write traffic has to go to disk for persistence, but does avoid read I/Os with large cache
what is used for write performance benefits
buffering -by buffering the number of writes in memory, the file system can then schedule subsequent I/Os -avoiding writes by delaying them
collision
when two file names hash to the same location
advantages and disadvantages of hash table
decreases directory search time only good if entries are fixed size or use chained-overflow method
why do you have two copies of the FAT
to have a backup in case of system failure