2.4 Organisation and structure of data

0.0(0)
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/32

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

33 Terms

1
New cards

Field

A field is a single data item.

2
New cards

Record

A record is a collection of items, all relating to an object, and is treated as a unit for processing.

3
New cards

File

A file is an organized collection of related records.

4
New cards

Master File

A master file holds descriptive data.
The actual data that is supposed to be processed and holds the resultant data after the process is completed, i.e. long-term data records, which contain data that does not change or data which is periodically updated
Data is held sequentially, in key field order.
Example: Customer details for electricity company

5
New cards

Transaction File

A transaction file is a temporary file used to store data which will be used to update the master file.

Contains the transactions, i.e. changes that are supposed to be made to the data in the master file

Data is held serially in temporal order, i.e. in the order it was collected

Example: Customer meter readings for electricity company

6
New cards

File Update Process

The transaction file is sorted.
The master file and transaction file go through an update process to create the new, updated master file.

<p>The transaction file is sorted.<br>The master file and transaction file go through an update process to create the new, updated master file.</p>
7
New cards

Fixed Length Records

Fixed-length record is a type of file where all the records are of the same length in terms of bytes and fields.

8
New cards

Advanatges of fixed length records

Fixed length record is easier to program as it can be calculated to know how much storage space will be required

The position of any record in the file can be calculated by multiplying the record length in bytes by the record sequence number.

A fast binary search can be used to locate a fixed length record in a sequential file.

Fixed length records can be quickly updated in-situ without affecting other records in the file.

Fixed length field / records are quicker to process (read/write) by computer as start and end locations are known

9
New cards

Disadvantages of fixed length records

Fixed length record waste storage space as fields have blank space

Fixed length record will truncate long fields

10
New cards

What are variable length records?

Variable-length records are a type of file where one or more fields can be of different lengths in each record.

When the record is stored, each field has a field terminator, and there will be a record terminator at the end of a whole record.

Each field starts with a byte showing the size of the field. The record starts with a byte showing the size of the whole record.

11
New cards

Advanatges of variable length records

They are used in both serial files and sequential files

Variable-length records are preferred when the records in a file are of very different lengths, so as to avoid wasting storage

Variable-length records are suitable for situations where no searching or updating is necessary, e.g., transaction files, which will be used later to update a master file.

12
New cards

Disadvantages of variable length records

Variable-length records can only be found using a slower linear search method

Record can't be directly accessed

If a variable-length record is updated, the size of the record will change. The file will need to be rebuilt and the updated record inserted at the correct point in the sequence.

13
New cards

What are the types of files?

ā€¢ Serial
ā€¢ Sequential
ā€¢ Random (Direct)
ā€¢ Indexed Sequential

14
New cards

Serial File

A serial file is where records are not stored in any particular order.
They are just stored in the order they occur.
This is the fastest type of organising file and is used when the order should be chronological, for example a record of all outgoing phone calls at a mobile carrier (for a single working day).

15
New cards

How to add a new record to a serial file

To add to a serial file, the new record is appended (added) to the end of the file.

16
New cards

Sequential File

Sequential file is where records are stored and accessed in key sequence order (eg: sorted by primary key)

it is easier and quicker to search a sequential file for a record

17
New cards

How to add a new record to a sequential file

ā€¢ Make a new copy of the records until the correct place to add a new record is reached.
ā€¢ Add the new record to the new copy
ā€¢ Continue until end of file is reached.
ā€¢ If multiple records are to be added, these should be sorted into order first to prevent multiple updates to the file.

18
New cards

How to delete a record from a sequential file

ā€¢ Make a new copy of the records until the record to be deleted is reached.
ā€¢ Do not copy the record to be deleted.
ā€¢ Continue until the end of the file is reached.
ā€¢ If multiple records are to be deleted, these should be sorted in advance.

19
New cards

Random (Direct) Files

A random file is where records are stored at an address calculated by the hashing algorithm based on the key field, which allows individual records to be accessed quickly without the need to reload the entire file.

20
New cards

Random (Direct) Files Features

ā€¢ A data collision occurs when two data items are hashed to the same location
ā€¢ In this case there needs to be overflow areas where the latest data is stored
ā€¢ When there are many items in the overflow area, access may become slow
ā€¢ In which case a new hashing algorithm is required and a larger file may be needed.

21
New cards

Indexed Sequential Files

An indexed sequential file is a file structure where records are stored in key sequence order and where an index is used to allow data to be accessed directly.

Indexed sequential files are a combination of sequential and random file access.

Indexed sequential files are important in situations where data needs to be accessed Randomly AND Sequentially.

In an indexed sequential file, blocks are normally partially filled (when the file is first created) to allow for more entries later.
It can only be stored on random access media

22
New cards

The index in an indexed sequential file

The Index is stored at the start of the file - it gives the highest key stored in each 'block' of the file.

When a certain key is searched for, the index will indicate in which block of the file it is stored.

23
New cards

Multi-Level Indexes

A multi-level index is one where the index is too large and so is split into a number of separate indexes.

The index at the start of the file would point to other, smaller indexes located at intervals in the file.

A multilevel index arises where this index is a main index which itself contains a range of addresses and the location/block of the next level index. This process may extend to several levels, with the last index containing the physical address of the record.

24
New cards

Adding Records to an Indexed Sequential File

ā€¢ Place in a block if possible
ā€¢ If a block becomes full, an overflow area is used.
ā€¢ Access may become slow as more records in the overflow area, so re-organisation may become necessary.

25
New cards

Deleting Records from an Indexed Sequential File

ā€¢ Record is normally marked as deleted in the index but not physically removed.

26
New cards

What is a hashing algorithm?

A hasing algorithm is a process that includes a mathematical hasing function that converts the key field of random length into a compressed numeriacal hash value of fixed length to determine the disc address.

27
New cards

What is an overflow area?

An overflow area is necessary in case the calculated address is already occupied by data. When this happens, the hashing algorithm points to a separate overflow area, where the data is normally stored and searched in linear order.

28
New cards

Archiving

Archiving is the process of storing data/files which are no
longer in current or frequent use.

It is held for security / legal / historical reason, or even just as a backup.

Archiving frees up resources on the main computer system which could mean faster access of the 'in-use' data.

29
New cards

Backing-up Files

Backing up files is very important to protect against data loss (either accidental or deliberate)

A three generation file backup system involves storage of three of the most recent versions of master file (& transaction file if appropriate)

This is useful if one version is corrupted: the previous version(s) is still available.

GFS ā€“ Grandfather, Father, Son are three generations of backups

30
New cards

Passwords

Passwords protect data from unauthorized people using user ID and personal password.

31
New cards

Transaction Logs

Transaction logs can help track down the person responsible if data is damaged or deleted.
Any file access requests are logged with the time, data and username along with the names of the files that were accessed.

32
New cards

Encryption

Encryption is the encoding of data to safeguard it during transfer or storage by making a file impossible to read without the encryption key, algorithm, / code.

33
New cards

File Management Utilities

Archivers

ā€¢ To archive job folders for future reference.

ā€¢ Output a single file when provided with a directory or a set of files for long term storage.

Data conversion utilities

ā€¢ Transform data from a source file to some other format, such as from a text file to a PDF document for customersā€™ distribution

Data recovery

ā€¢ Used to rescue good data from corrupted files.

Revision / version control utilities

ā€¢ Recreate a coherent structure where multiple users simultaneously modify the same file to help several translators work on a common source document.

File managers

ā€¢ Provide a method of performing routine data management tasks; assist in deleting, renaming, moving, copying, merging, setting write protection status, setting file access permissions, and generating and modifying folders and data sets.