Know the different methods that can be used to organise data in a file and select an appropriate method for a given problem.
Know the different methods that can be used to access data in a file.
Know the purpose of hashing algorithms and how these can be used when reading data from and writing data to a file.
Records are physically stored one after the other in chronological order only.
Retrieval- each record must be read through until the specific one is found.
Simple and effective for small files/order isn’t important e.g. meter readings.
New records are appended to the end (added at the end in the next available space).
Records are physically stored one after the other based on a unique key field (like Primary Key in database).
Retrieval - Sequential or Direct access (see below).
New records are inserted into correct position.
Faster searches performed.
Adding/deleting data requires file to be updated to maintain order - time consuming.
Writing to:
Generate record key - (similar to hashing in random - see below) unique key is used.
Apply Hash function - hash function makes an index
Insert into Index - record stored in sequential file and index is updated with hash value pointing to actual storage location.
Reading from:
Generate record key - key for desired record is given
Apply Hash function - same hash function is applied to find the index.
Access Index - index points to storage location in sequential file.
Retrieve record - record directly accessed from sequential file using location given by the index.
Records are stored in any available position.
Retrieval - Hashing Algorithm/ Direct Access (see below).
New records can be easily added, records easily deleted
Facilitate immediate access to records by finding exact location of records.
Intakes key and carries out calculation outputting a “hash value” (points to location)
Every time the same key is entered it generates the same hash value
Problem - hash value can be duplicated with different record keys (e.g. below “Mas” ASCII value is also 289)
If the record is to be stored it can either:
Closed Hash - search file linearly to find next available space.
Open Hash - search overflow area linearly to find next available space.
If the record is to be found it can either:
Closed Hash - search linearly from where you are until matching record key is found.
Open Hash - search overflow area linearly until matching record key is found.
e.g. “Sam” is key - ASCII value for the key is 289 which is then passed into the hashing algorithm.
This generates a hash value using Address = (value) MOD 10 (28.9 MOD is 9) indicating storage location 9 where the data is read from or written to.
Writing to
Generate record key - Unique key e.g. customer ID or employee number
Apply Hash function - Hashing algorithm applied to the record key to make “hash value” which should respond to storage address/index in file.
Store Record - record stored in location given by has value
Reading from
Generate record key - same key used for writing is provided
Apply Hash function - same hashing algorithm is applied to determine hash value
Retrieve record - record is directly accessed from hash value.
Serial file organisation - checks every record in chronological order until data is found, can be time consuming depending on file size and record location. Suitable for tasks where order of data entry is important e.g. logging events.
Sequential file organisation - checks every record/key field until key field is greater than record being searched e.g Checking key field Surname to find Smith but will stop searching when it reaches T. Suitable for tasks where data needs to processed in a specific order e.g. sorted reports.
Access records in any file without checking records
Sequential file organisation - If index is used, the index uses pointers for the records allowing direct access to the specific location. Suitable where quick access is necessary (often faster than Sequential Access)
Random file organisation - hashing algorithm determines storage location based on record key, if exact location is known it doesn’t need to check other records. Suitable for applications needing fast retrieval and random access patterns e.g. database management systems and real-time systems.