CS2040S Lecture: Hashing
Hash Tables
- A hash table is a data structure that implements an associative array, a structure that can map keys to values.
- It uses a hash function to compute an index into an array of buckets or slots.
Symbol Table
- Definition: An abstract data type that supports insertion, deletion, and search operations.
- Operations:
void insert(Key k, Value v): Insert a key-value pair into the table.Value search(Key k): Retrieve the value associated with a key.void delete(Key k): Remove a key-value pair from the table.boolean contains(Key k): Check if a key exists.int size(): Return the size of the table.
Examples of Symbol Tables
- Dictionary
- Key: word; Value: definition.
- Phone Book
- Key: name; Value: phone number.
- Internet DNS
- Key: website URL; Value: IP address.
- Java Compiler
- Key: variable name; Value: type and value.
Implementing Symbol Tables
- AVL Tree Based: Different costs for insertions (CI) and searches (CS) can be achieved depending on the algorithm.
- Different combinations: CI = O(1), O(log n), O(n) can affect CS.
- Fast implementations aim for CI = O(1) and CS = O(1).
Issues with Symbol Tables
- Sorting
- Cannot efficiently sort data with a symbol table.
- For AVL trees, sorting runs in O(n log n), while symbol tables could take O(n²).
- Efficiency
- Direct access tables are manageable only with integer keys. Non-integer keys can lead to inefficient data storage.
Collision Management in Hash Tables
- Collision: Occurs when two distinct keys hash to the same index.
- If
h(k1) = h(k2), a collision occurs for keys k1 and k2.
- Coping Strategies:
- Chaining: Store colliding elements in a linked list within the same bucket.
- Open Addressing: Find another bucket for the new item.
Hash Functions
- Definition: A method to convert a key into an index (bucket).
- Assumptions:
- Hash functions should uniformly distribute keys across the buckets to minimize collisions.
- Each key is equally likely to map to any bucket.
- Expected Search Time: O(1 + n/m) where n is the number of items and m is the number of buckets.
- With a good hash function and enough buckets, the expected maximum cost for searching becomes logarithmic, O(log n).
Applications of Hash Tables
- Fast data retrieval applications such as database indexing, caching, and set memberships.
- Used extensively in programming languages and system operations due to their efficiency in handling searches and insertions.