Detailed Notes on Hashing and Hash Tables

Previous discussion focused on binary search trees as an efficient way to implement sets/maps.
Hashing provides another efficient approach which can outperform search trees.

When implementing collections, order can be determined by:
- Addition/Removal Order: Used in stacks, queues, and lists.
- Value Comparison: Used in ordered lists and search trees.
Hashing determines the location of an item via a hash function, storing elements in a hash table.
Cells/Buckets: The locations in the hash table where elements are stored.
Operations using hashing are expected to be O(1) due to direct computation of locations.

A simple example uses the first letter of names to map them to an index.
Key Concepts:
- Hash Table: Storage structure that uses a hashing function to determine the position of entries.
- Collision: Occurs when two keys map to the same location in a hash table.
- Perfect Hashing Function: A function that maps every key to a unique location in the table.

Table size decisions depend on data set size:
- Use the same size as the data set if perfect hashing is possible.
- If not perfect but data size is known, a common approach is to set the table to 150% of the data size.
- If data size is unknown, dynamic resizing is utilized by creating a larger table and rehashing existing elements.

Not all hashing functions need to be perfect; a good function results in O(1) access.
Some common methods to create a hashing function:
- Extraction: Uses part of a value (e.g., the first letter of a string).
- Division Method: Uses the remainder of the key divided by a number p to get an index.
- Defined as: $Hashcode(key) = Math.abs(key) \bmod p$
- Using prime numbers for p generally leads to better key distribution.
- Folding Method: Combines parts of the key to form an index.
- Mid-Square Method: Squares the key and extracts middle digits for the index.
- Radix Transformation Method: Converts key to another numeric base and applies the division method.
- Digit Analysis Method: Uses specific digits from a key to form the index, allowing manipulation techniques (like reversal).
- Length-Dependent Method: Combines key length and value for indexing.

Deletion handling depends on the collision resolution method:
- Chaining: Direct removal from lists at each table cell.
- Open Addressing: Marking elements as deleted rather than physically removing them to preserve search capability.

Hashing stores elements in a hash table determined by a hashing function.
Collisions occur when keys map to the same location.
Effective hashing functions ensure efficient distribution and recovery from collisions.
Java provides several implementations of hash tables in its Collections API, which are important for practical applications.