2.4 Organisation and structure of data

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/45

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 5:29 PM on 3/1/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

46 Terms

1
New cards

Field

A field is a single data item. (With a defined data type)

2
New cards

Record

A record is a collection of fields, all relating to an object (such as a person), that can store a single set of values. A record can be fixed length or variable length

3
New cards

File

A file is an organized collection of related records.

<p>A file is an organized collection of related records.</p>
4
New cards

What is the purpose fo files in data processing?

  • Data storage - Allows for preservation of information beyond the duration of the programs execution

  • Data persistence - this is improtant as applications may need to remeber information between sessions so data will stay the same

  • Data retrieval - Programs can read from files to access stored information, facilitating the processing and manipulation of data

5
New cards

How do programs interact with files?

  • Create files

  • Organise files (using defined structures and access methods)

  • Updating files (adding, deleting, or modifying records)

  • Processing files (reading records, performaing calculations, and producing outputs)

6
New cards

Master File

A master file stores long-term descriptive information. It stores all data required to perform processing operations and results from processing operations (Long-term records that do not change or is only periodically updated). They are generally updated periodically from a transaction file and held in a sequential order

e.g. Montly hours worked

7
New cards

Transaction File

A tranaction file is a temporary file that stored data collected over a short period of time. These files act as temporary storage, which means the system can run efficiently whilst storing required data. At the end of the short period the tranaction files are copied and processed to update the master file and the transaction file is cleared after.

Tranactions are the changes that are supposed to be made to data in the master file

Data is held serially (order it was collected)

(Could be used for daily time sheets)

8
New cards

File Update Process

The transaction file is sorted.
The master file and transaction file go through an update process to create the new, updated master file.

<p>The transaction file is sorted.<br>The master file and transaction file go through an update process to create the new, updated master file.</p>
9
New cards

Fixed Length Records

Fixed-length record is a type of file where all the records are of the same size in terms of bytes and fields. Which means the size of fields must be defined before inserting data.

Each field has a defined size, which means all records are the same length.

10
New cards

Advanatges of fixed length records

  • Fixed length record is easier to program as it can be calculated to know how much storage space will be required (as all records are the same length)

  • Fixed length allows for faster access and simpler processing (can directly access data with a calculation)

11
New cards

Disadvantages of fixed length records

  • If data is too long it will have to be truncated (Inflexible)

  • Space may be wasted if there is blank space in the fields

12
New cards

What are variable length records?

A variable length record will vary in size depending on the data it contains.

When the record is stored, each field has a field terminator, and there will be a record terminator at the end of a whole record.

Generally used where efficient searching is less crucial like in a transaction files that is processed sequentially

13
New cards

Advanatges of variable length records

  • Variable-length records are suitable for situations where no searching or updating is necessary, e.g., transaction files, which will be used later to update a master file

  • Less space is wasted as only storage required is used (more efficient use of space)

  • Flexible

14
New cards

Disadvantages of variable length records

  • Variable length records are more difficult to manage because it is harder to predict storage requirements

  • Searchibng is less efficient

  • The position of a specific record cannot be caluclated directly using the record size

  • If a record is updated and the size changes the file will need to be rebuilt and the record inserted at the correct point in the sequence

15
New cards

How would I estimate file size?D

You would calcualte the size of a single record. Then you would multiply that by the number of total records. Then you would add 10% for meta data

<p>You would calcualte the size of a single record. Then you would multiply that by the number of total records. Then you would add 10% for meta data</p>
16
New cards

What is a data dictionary?

A file containting information on the structure of a record

<p>A file containting information on the structure of a record</p>
17
New cards

What are file access methods?

It describes how records are located and read from a file. They determine the order in which data is accessed and how quickly a specific record can be found. (They will change based on needs of the user)

18
New cards

Why are file access methods important?

  • Efficiecny - How quickly data can be retrieved which is affected by this

  • Perfromacne - LArge files have lots of data so to be more efficient there may be skipping of files

  • Suitability - some may require differenta ccess

  • Scalability - As files grow some methods degrade in performance

19
New cards

What are the types of files and access methods?

• Serial
• Sequential
• Random (Direct)
• Indexed Sequential

20
New cards

Serial File

Records are stored in the order they are added with no logical order (not sorted). Records are appended onto the end of a file (like a shopping list). Serial file access goes from the start and reads one at a time untill it has read them all).

This is used when records dont need to be sorted, data is processed once (like transaction files), or data is stored in the order it arrives. Generally good when files need to be chronological order like phone logs

21
New cards

How to add a new record to a serial file

To add to a serial file, the new record is appended (added) to the end of the file.

22
New cards

Sequential File

Records are stored in a defined logical order based on key fields (like an ID or surname). Records are accessed from the start of the file one after another reading each record untill it finds the correct one (faster than serial but slower than direct).

Gnerally used when records need to be kept in order, searching is required but cannot direct access, or processing data in order (like student grades based on surname)

23
New cards

How to add a new record to a sequential file

• Make a new copy of the records until the correct place to add a new record is reached.
• Add the new record to the new copy
• Continue until end of file is reached.
• If multiple records are to be added, these should be sorted into order first to prevent multiple updates to the file.

24
New cards

How to delete a record from a sequential file

• Make a new copy of the records until the record to be deleted is reached.
• Do not copy the record to be deleted.
• Continue until the end of the file is reached.
• If multiple records are to be deleted, these should be sorted in advance.

25
New cards

Pros and Cons of serial

  • Simple to implement

  • Efficient for procesing data once

  • Suitable for large volumes of data (transaction files)

Cons:

  • Not organised

  • Inefficient if records need to be re-used/searched for

26
New cards

Pros and Cons of sequential?

Pros:

  • Data is organised and sorted

  • Suitable for batch processing of structured data

Cons:

  • Slower than direct access

  • Records must be read in order

  • Need to copy file when inserting data

27
New cards

Pseudcode for inserting into a serial and sequential file

knowt flashcard image
28
New cards

Random (Direct) Files

A random file is where records are stored at an address calculated by the hashing algorithm based on the key field, which allows individual records to be accessed quickly without the need to reload the entire file.

29
New cards

Random (Direct) Files Features

• A data collision occurs when two data items are hashed to the same location
• In this case there needs to be overflow areas where the latest data is stored
• When there are many items in the overflow area, access may become slow
• In which case a new hashing algorithm is required and a larger file may be needed.

30
New cards

What is indexing?

An index stores key values and the locations of records within a file (instead of searching a whole file the index is searched then it points to where the required record is (pointer position) and the program jumps to the correct part and reads the record)

31
New cards

Indexed Sequential Files

An index file is used to speed up searching by having pointers to key groups within the file (like A group and B group). Once the correct group is found using the index the records are read sequentially. (This reduces the number of data needed to be searched by using a combination of sequential and random access).

Inserting and deleting records is more expensive as records and index entries need to be updated.

In an indexed sequential file, blocks are normally partially filled (when the file is first created) to allow for more entries later.

<p>An index file is used to speed up searching by having pointers to key groups within the file (like A group and B group). Once the correct group is found using the index the records are read sequentially. (This reduces the number of data needed to be searched by using a combination of sequential and random access).</p><p>Inserting and deleting records is more expensive as records and index entries need to be updated.</p><p>In an indexed sequential file, blocks are normally partially filled (when the file is first created) to allow for more entries later.</p>
32
New cards

The index in an indexed sequential file

The Index is stored at the start of the file - it gives the highest key stored in each 'block' of the file.

When a certain key is searched for, the index will indicate in which block of the file it is stored.

33
New cards
<p>How would you search this?</p>

How would you search this?

You would find get your key (256) and then search the index for the next highest value (320) then you would start from the previous group (A) and sequentially search untill you reach the end iof the group (320)

<p>You would find get your key (256) and then search the index for the next highest value (320) then you would start from the previous group (A) and sequentially search untill you reach the end iof the group (320)</p>
34
New cards

Multi-Level Indexes

A multi-level index is one where the index is too large and so is split into a number of separate indexes.

The index at the start of the file would point to other, smaller indexes located at intervals in the file.

A multilevel index arises where this index is a main index which itself contains a range of addresses and the location/block of the next level index. This process may extend to several levels, with the last index containing the physical address of the record. This reduces overhead where indexes will point to other indexes before reaching data (which speeds up searching but makes updates even more expensive)

35
New cards
<p>Example of a multi level index</p>

Example of a multi level index

The search will start from the top moving down each index level pointing to indexes and then reaching data

<p>The search will start from the top moving down each index level pointing to indexes and then reaching data</p>
36
New cards

Adding Records to an Indexed Sequential File

• Place in a block if possible
• If a block becomes full, an overflow area is used.
• Access may become slow as more records in the overflow area, so re-organisation may become necessary.

37
New cards

Deleting Records from an Indexed Sequential File

• Record is normally marked as deleted in the index but not physically removed.

38
New cards

What is a hashing algorithm?

A hasing algorithm is a process that includes a mathematical hasing function that converts the key field of random length into a compressed numeriacal hash value of fixed length to determine the disc address.

39
New cards

What is an overflow area?

An overflow area is necessary in case the calculated address is already occupied by data. When this happens, the hashing algorithm points to a separate overflow area, where the data is normally stored and searched in linear order.

40
New cards

Archiving

Archiving is the process of storing data/files which are no longer in current or frequent use outside of the main system in long term storage.

It is held for security / legal / historical reason, or even just as a backup.

Archiving frees up resources on the main computer system which could mean faster access of the 'in-use' data (improving performacnce, saving storage space)


Can still be acessed but not used alot

41
New cards

Backing-up Files

Backing up files is very important to protect against data loss (either accidental or deliberate)

A three generation file backup system involves storage of three of the most recent versions of master file (& transaction file if appropriate) (More may require more storage)

This is useful if one version is corrupted: the previous version(s) is still available.

GFS – Grandfather, Father, Son are three generations of backup

Files ar ecopies from the main system to a sperate location (usually off-site) so data can be restored if lost/corrupted/deleted.

42
New cards

Passwords

Passwords protect data from unauthorized people using user ID and personal password.

43
New cards

Transaction Logs

Transaction logs can help track down the person responsible if data is damaged or deleted.
Any file access requests are logged with the time, data and username along with the names of the files that were accessed.

44
New cards

Encryption

Encryption is the encoding of data to safeguard it during transfer or storage by making a file impossible to read without the encryption key, algorithm, / code.

45
New cards

File Management Utilities

Archivers

• To archive job folders for future reference.

• Output a single file when provided with a directory or a set of files for long term storage.

Data conversion utilities

• Transform data from a source file to some other format, such as from a text file to a PDF document for customers’ distribution

Data recovery

• Used to rescue good data from corrupted files.

Revision / version control utilities

• Recreate a coherent structure where multiple users simultaneously modify the same file to help several translators work on a common source document.

File managers

• Provide a method of performing routine data management tasks; assist in deleting, renaming, moving, copying, merging, setting write protection status, setting file access permissions, and generating and modifying folders and data sets.

46
New cards

Datatypes

knowt flashcard image

Explore top notes

note
Unit 8: Ecology
Updated 83d ago
0.0(0)
note
Chapter 20: Water Pollution
Updated 983d ago
0.0(0)
note
APES 5.10 Impacts of Urbanization
Updated 1127d ago
0.0(0)
note
1984 - Introduction Notes
Updated 1733d ago
0.0(0)
note
Grade 10 Biology: Lesson 9
Updated 1164d ago
0.0(0)
note
32 elements General
Updated 347d ago
0.0(0)
note
Chapter 2: Data
Updated 1079d ago
0.0(0)
note
Unit 8: Ecology
Updated 83d ago
0.0(0)
note
Chapter 20: Water Pollution
Updated 983d ago
0.0(0)
note
APES 5.10 Impacts of Urbanization
Updated 1127d ago
0.0(0)
note
1984 - Introduction Notes
Updated 1733d ago
0.0(0)
note
Grade 10 Biology: Lesson 9
Updated 1164d ago
0.0(0)
note
32 elements General
Updated 347d ago
0.0(0)
note
Chapter 2: Data
Updated 1079d ago
0.0(0)