1/18
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
hash table
a data structure that implements an associative array (a dictionary). In an associative array, data is stored as a collection of key-value pairs. The position of the data within the array is determined by applying a hashing algorithm to the key - a process called hashing. The hashing algorithm is called a hash function.
implementation of a hash table
To implement a hash table, you must use an array because you have to be able to access each position of the array directly. positions within an array are sometimes called buckets; each bucket is used to store data. This can be a single piece of data or can be a more complex object such as a record. The key must be stored as part of, or alongside, the data. The size of the array must be planned carefully. It must be big enough to store all of the data but not so large that you waste space. However, an effective hash table must always have some free space.
The load factor of a hash table can be defined as:
loadfactor=kn
where k is the number of buckets (positions) in the array and n is the number of occupied buckets. Keeping the load factor at around 0.75 is optimal.
hash function
algorithm that converts a hash key to a hash value
requirements of a hash function
always produce the same hash value for the same key.
provide a uniform distribution of hash values. This means that every value has an equal probability of being generated.
minimise clustering; this will arise when many different keys produce the same hash value. Where two or more different keys produce the same hash value, we say a ‘collision’ has occurred.
be very fast to compute.
simple hash function example
this hash table has keys made up of 5 digits and 97 buckets. the modulo division ensures that no hash value is greater than the number of indices in the array. this means that the value associated with the key 12345 is stored in the position 26 of the array.

hashing character data
Not all key values are simple integers so the hash function will need to do more processing. if the key contains characters, then they need to be converted into their ASCII values and added together, for example, A5RD would have ASCII values 65 + 53 + 82 + 68 = 268.

inserting data into a hash table
To insert data into a hash table, you must use the hash function to generate the index of the position in the array that will be used to store the data. Remember that the key must be stored as part of, or alongside, the data. in the example A5RD=268 268MOD97=74 so the data and key is stored in the position 74

retrieving values from a hash table
To retrieve a value from a hash table, the hash function is applied to the key in order to generate the index for the position within the array. The key R2YJ generates the hash value 4; this is where the data associated with the key R2YJ will be located. The data can be retrieved in a single operation. You do not need to use a linear or binary search, you can go directly to the correct position because an array allows direct access by index.

searching for an item that does not exits

collisions
A collision occurs when the hash function produces the same hash value for two or more keys. this is a problem because you cannot store two sets of data at the same location. to deal with this you can use either of these methods: linear probing and chaining.
linear probing
if a key returns a value that already has data stored in that position, then the next free bucket is chosen to store the data represented to the key, this is done by iterating through the array until the next free bucket is found. in the example the key B7M4 has a hash value of 74, but that space is occupied so it iterates by one to position 75, which is also occupied so it iterates again to 76 which is empty so the key and data is stored at 76. this can make things complicated as if a key points to 76, it will now have to be stored in 77.

retrieving data using linear probing
hash the key to generate the index value
examine the indexed position to see if the key (which is stored together with the data) matches the key of the data you are looking for
if there is no match, check each location that follows (at the appropriate interval) until the matching record is found
if you reach an empty bucket without finding a matching key, the data is not in the hash table.
chaining (also known as addressing)
this method uses linked lists, instead of storing the actual data in the hash table, you store a pointer to a location where the data is stored. Each data item is stored as a node with three attributes:
the key value
the data
a pointer to the next node
so each element in the array, would store the pointer that points to the first node in the linked list

retrieving data using chaining
hash the key to generate the index value
examine the indexed position to get the pointer to the node at the head of the list
examine the key of the node to see if it matches the key of the data you are looking for
if it is not the node you are looking for, follow the linked list until you find the node you are looking for
if you reach the end of the list without finding a matching key (or the head pointer was null), the data is not in the hash table
rehashing
If the hash table starts to fill up, or a large number of items are in the wrong place, the performance of the hash table will degrade. Rehashing can be used to address this problem. When you rehash, a new hash table is created (larger if needed). The key for each item in the existing table will be rehashed and the item will be inserted into the new hash table. If the new table is larger the hashing algorithm will need to be modified to use the new number of buckets
using hash tables for databases
in this example the primary key of the table is the student ID, and this can be hashed to provide a physical location to store the data within a hash table. Database systems will usually allow you to index other fields (in addition to the primary key). Hash tables will facilitate the high speed retrieval of data based on any field that is indexed.

password hashing
A password hash is a "one-way function". This is an operation that is easy to perform, but very difficult - hopefully almost impossible - to reverse. It acts as a form of encryption. When a password is entered for the first time, it is hashed and saved to a file. Whenever the user attempts to log on using the password, the system performs the same hash on the password that has been entered and compares it to the stored hash value. If it is the same, access is granted.
If someone hacks the file, they can only get hold of the hashed value. This is of no use as the original unhashed value is needed to log in.
problems with password hashing
Most hashing algorithms are well known and hackers can use rainbow tables that store pre hashed values of a whole dictionary of possible passwords to look for a match. When a match is found, the associated password is revealed.
a salt in password hashing
a salt is used to improve password hashing. A salt is a piece of random data that is added to the password before hashing it. The salt value is stored in plain text together with the user’s user id and hashed password. When a user attempts to log in, their record is retrieved and the salt value is added to the entered password. The combined data is then hashed and compared to the stored hash value. Now, even if someone uses the same password for multiple systems, different hash values will be stored.