1/32
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Field
A field is a single data item.
Record
A record is a collection of items, all relating to an object, and is treated as a unit for processing.
File
A file is an organized collection of related records.
Master File
A master file holds descriptive data.
The actual data that is supposed to be processed and holds the resultant data after the process is completed, i.e. long-term data records, which contain data that does not change or data which is periodically updated
Data is held sequentially, in key field order.
Example: Customer details for electricity company
Transaction File
A transaction file is a temporary file used to store data which will be used to update the master file.
Contains the transactions, i.e. changes that are supposed to be made to the data in the master file
Data is held serially in temporal order, i.e. in the order it was collected
Example: Customer meter readings for electricity company
File Update Process
The transaction file is sorted.
The master file and transaction file go through an update process to create the new, updated master file.
Fixed Length Records
Fixed-length record is a type of file where all the records are of the same length in terms of bytes and fields.
Advanatges of fixed length records
Fixed length record is easier to program as it can be calculated to know how much storage space will be required
The position of any record in the file can be calculated by multiplying the record length in bytes by the record sequence number.
A fast binary search can be used to locate a fixed length record in a sequential file.
Fixed length records can be quickly updated in-situ without affecting other records in the file.
Fixed length field / records are quicker to process (read/write) by computer as start and end locations are known
Disadvantages of fixed length records
Fixed length record waste storage space as fields have blank space
Fixed length record will truncate long fields
What are variable length records?
Variable-length records are a type of file where one or more fields can be of different lengths in each record.
When the record is stored, each field has a field terminator, and there will be a record terminator at the end of a whole record.
Each field starts with a byte showing the size of the field. The record starts with a byte showing the size of the whole record.
Advanatges of variable length records
They are used in both serial files and sequential files
Variable-length records are preferred when the records in a file are of very different lengths, so as to avoid wasting storage
Variable-length records are suitable for situations where no searching or updating is necessary, e.g., transaction files, which will be used later to update a master file.
Disadvantages of variable length records
Variable-length records can only be found using a slower linear search method
Record can't be directly accessed
If a variable-length record is updated, the size of the record will change. The file will need to be rebuilt and the updated record inserted at the correct point in the sequence.
What are the types of files?
ā¢ Serial
ā¢ Sequential
ā¢ Random (Direct)
ā¢ Indexed Sequential
Serial File
A serial file is where records are not stored in any particular order.
They are just stored in the order they occur.
This is the fastest type of organising file and is used when the order should be chronological, for example a record of all outgoing phone calls at a mobile carrier (for a single working day).
How to add a new record to a serial file
To add to a serial file, the new record is appended (added) to the end of the file.
Sequential File
Sequential file is where records are stored and accessed in key sequence order (eg: sorted by primary key)
it is easier and quicker to search a sequential file for a record
How to add a new record to a sequential file
ā¢ Make a new copy of the records until the correct place to add a new record is reached.
ā¢ Add the new record to the new copy
ā¢ Continue until end of file is reached.
ā¢ If multiple records are to be added, these should be sorted into order first to prevent multiple updates to the file.
How to delete a record from a sequential file
ā¢ Make a new copy of the records until the record to be deleted is reached.
ā¢ Do not copy the record to be deleted.
ā¢ Continue until the end of the file is reached.
ā¢ If multiple records are to be deleted, these should be sorted in advance.
Random (Direct) Files
A random file is where records are stored at an address calculated by the hashing algorithm based on the key field, which allows individual records to be accessed quickly without the need to reload the entire file.
Random (Direct) Files Features
ā¢ A data collision occurs when two data items are hashed to the same location
ā¢ In this case there needs to be overflow areas where the latest data is stored
ā¢ When there are many items in the overflow area, access may become slow
ā¢ In which case a new hashing algorithm is required and a larger file may be needed.
Indexed Sequential Files
An indexed sequential file is a file structure where records are stored in key sequence order and where an index is used to allow data to be accessed directly.
Indexed sequential files are a combination of sequential and random file access.
Indexed sequential files are important in situations where data needs to be accessed Randomly AND Sequentially.
In an indexed sequential file, blocks are normally partially filled (when the file is first created) to allow for more entries later.
It can only be stored on random access media
The index in an indexed sequential file
The Index is stored at the start of the file - it gives the highest key stored in each 'block' of the file.
When a certain key is searched for, the index will indicate in which block of the file it is stored.
Multi-Level Indexes
A multi-level index is one where the index is too large and so is split into a number of separate indexes.
The index at the start of the file would point to other, smaller indexes located at intervals in the file.
A multilevel index arises where this index is a main index which itself contains a range of addresses and the location/block of the next level index. This process may extend to several levels, with the last index containing the physical address of the record.
Adding Records to an Indexed Sequential File
ā¢ Place in a block if possible
ā¢ If a block becomes full, an overflow area is used.
ā¢ Access may become slow as more records in the overflow area, so re-organisation may become necessary.
Deleting Records from an Indexed Sequential File
ā¢ Record is normally marked as deleted in the index but not physically removed.
What is a hashing algorithm?
A hasing algorithm is a process that includes a mathematical hasing function that converts the key field of random length into a compressed numeriacal hash value of fixed length to determine the disc address.
What is an overflow area?
An overflow area is necessary in case the calculated address is already occupied by data. When this happens, the hashing algorithm points to a separate overflow area, where the data is normally stored and searched in linear order.
Archiving
Archiving is the process of storing data/files which are no
longer in current or frequent use.
It is held for security / legal / historical reason, or even just as a backup.
Archiving frees up resources on the main computer system which could mean faster access of the 'in-use' data.
Backing-up Files
Backing up files is very important to protect against data loss (either accidental or deliberate)
A three generation file backup system involves storage of three of the most recent versions of master file (& transaction file if appropriate)
This is useful if one version is corrupted: the previous version(s) is still available.
GFS ā Grandfather, Father, Son are three generations of backups
Passwords
Passwords protect data from unauthorized people using user ID and personal password.
Transaction Logs
Transaction logs can help track down the person responsible if data is damaged or deleted.
Any file access requests are logged with the time, data and username along with the names of the files that were accessed.
Encryption
Encryption is the encoding of data to safeguard it during transfer or storage by making a file impossible to read without the encryption key, algorithm, / code.
File Management Utilities
Archivers
ā¢ To archive job folders for future reference.
ā¢ Output a single file when provided with a directory or a set of files for long term storage.
Data conversion utilities
ā¢ Transform data from a source file to some other format, such as from a text file to a PDF document for customersā distribution
Data recovery
ā¢ Used to rescue good data from corrupted files.
Revision / version control utilities
ā¢ Recreate a coherent structure where multiple users simultaneously modify the same file to help several translators work on a common source document.
File managers
ā¢ Provide a method of performing routine data management tasks; assist in deleting, renaming, moving, copying, merging, setting write protection status, setting file access permissions, and generating and modifying folders and data sets.