CS2040S Lecture: Hashing

Hash Tables

A hash table is a data structure that implements an associative array, a structure that can map keys to values.
It uses a hash function to compute an index into an array of buckets or slots.

Symbol Table

Definition: An abstract data type that supports insertion, deletion, and search operations.
Operations:
- void insert(Key k, Value v): Insert a key-value pair into the table.
- Value search(Key k): Retrieve the value associated with a key.
- void delete(Key k): Remove a key-value pair from the table.
- boolean contains(Key k): Check if a key exists.
- int size(): Return the size of the table.

Examples of Symbol Tables

Dictionary
- Key: word; Value: definition.
Phone Book
- Key: name; Value: phone number.
Internet DNS
- Key: website URL; Value: IP address.
Java Compiler
- Key: variable name; Value: type and value.

Implementing Symbol Tables

AVL Tree Based: Different costs for insertions (CI) and searches (CS) can be achieved depending on the algorithm.
- Different combinations: CI = O(1), O(log n), O(n) can affect CS.
Fast implementations aim for CI = O(1) and CS = O(1).

Issues with Symbol Tables

Sorting
- Cannot efficiently sort data with a symbol table.
- For AVL trees, sorting runs in O(n log n), while symbol tables could take O(n²).
Efficiency
- Direct access tables are manageable only with integer keys. Non-integer keys can lead to inefficient data storage.

Collision Management in Hash Tables

Collision: Occurs when two distinct keys hash to the same index.
- If h(k1) = h(k2), a collision occurs for keys k1 and k2.
Coping Strategies:
- Chaining: Store colliding elements in a linked list within the same bucket.
- Open Addressing: Find another bucket for the new item.

Hash Functions

Definition: A method to convert a key into an index (bucket).
Assumptions:
- Hash functions should uniformly distribute keys across the buckets to minimize collisions.
- Each key is equally likely to map to any bucket.

Performance Analysis of Hash Tables

Expected Search Time: O(1 + n/m) where n is the number of items and m is the number of buckets.
With a good hash function and enough buckets, the expected maximum cost for searching becomes logarithmic, O(log n).

Applications of Hash Tables

Fast data retrieval applications such as database indexing, caching, and set memberships.
Used extensively in programming languages and system operations due to their efficiency in handling searches and insertions.