Detailed Notes on Hashing and Hash Tables
Chapter Objectives
- Define hashing
- Examine various hashing functions
- Examine the problem of collisions in hash tables
- Explore the Java Collections API implementations of hashing
Introduction to Hashing
- Previous discussion focused on binary search trees as an efficient way to implement sets/maps.
- Hashing provides another efficient approach which can outperform search trees.
20.1 Hashing
- When implementing collections, order can be determined by:
- Addition/Removal Order: Used in stacks, queues, and lists.
- Value Comparison: Used in ordered lists and search trees.
- Hashing determines the location of an item via a hash function, storing elements in a hash table.
- Cells/Buckets: The locations in the hash table where elements are stored.
- Operations using hashing are expected to be O(1) due to direct computation of locations.
Hash Function Basics
- A simple example uses the first letter of names to map them to an index.
- Key Concepts:
- Hash Table: Storage structure that uses a hashing function to determine the position of entries.
- Collision: Occurs when two keys map to the same location in a hash table.
- Perfect Hashing Function: A function that maps every key to a unique location in the table.
Hash Table Size Considerations
- Table size decisions depend on data set size:
- Use the same size as the data set if perfect hashing is possible.
- If not perfect but data size is known, a common approach is to set the table to 150% of the data size.
- If data size is unknown, dynamic resizing is utilized by creating a larger table and rehashing existing elements.
20.2 Hashing Functions
- Not all hashing functions need to be perfect; a good function results in O(1) access.
- Some common methods to create a hashing function:
- Extraction: Uses part of a value (e.g., the first letter of a string).
- Division Method: Uses the remainder of the key divided by a number p to get an index.
- Defined as: Hashcode(key)=Math.abs(key)modp
- Using prime numbers for p generally leads to better key distribution.
- Folding Method: Combines parts of the key to form an index.
- Mid-Square Method: Squares the key and extracts middle digits for the index.
- Radix Transformation Method: Converts key to another numeric base and applies the division method.
- Digit Analysis Method: Uses specific digits from a key to form the index, allowing manipulation techniques (like reversal).
- Length-Dependent Method: Combines key length and value for indexing.
20.3 Resolving Collisions
- When perfect mapping isn't achievable, collision resolution methods become necessary:
- Chaining: Each cell in the hash table points to a collection (like a list). It can be executed through a larger table with overflow storage or linked lists.
- Open Addressing: Seeks another open position in the table on collision.
- Linear Probing: Looks for the next available slot sequentially.
- Quadratic Probing: Uses a formula to determine the next slot based on a quadratic sequence.
- Double Hashing: Uses a secondary hash function for determining new slot positions during a collision.
20.4 Deleting Elements from A Hash Table
- Deletion handling depends on the collision resolution method:
- Chaining: Direct removal from lists at each table cell.
- Open Addressing: Marking elements as deleted rather than physically removing them to preserve search capability.
20.5 Hash Tables in the Java Collections API
- Java Collections API has several implementations of hashing:
- Hashtable: Oldest implementation, synchronized, uses chaining.
- HashMap: Not synchronized but more commonly used, also uses chaining.
- HashSet: Implements Set interface; uses chaining, doesn’t guarantee order.
- IdentityHashMap: Compares using reference-equality.
- WeakHashMap: Automatically removes entries when keys are no longer used; supports null keys/values.
- LinkedHashSet and LinkedHashMap: Maintain insertion order through a doubly linked list strategy.
- Load Factor: The threshold for resizing tables, default is 0.75.
Summary of Key Concepts
- Hashing stores elements in a hash table determined by a hashing function.
- Collisions occur when keys map to the same location.
- Effective hashing functions ensure efficient distribution and recovery from collisions.
- Java provides several implementations of hash tables in its Collections API, which are important for practical applications.
Self-Review Questions
- What distinguishes a hash table from other collections?
- Define collision in the context of hash tables.
- Describe a perfect hashing function.
- What is the goal of a hashing function?
- What are the implications of poor hashing?
- Explain the extraction method in hashing.
- Define the division method.
- How does shift folding work?
- What distinguishes boundary folding?
- Explain the mid-square method.
- Describe the radix transformation method.
- What does the digit analysis method entail?
- Explain the length-dependent method.
- What is chaining in the context of collision resolution?
- Describe open addressing.
- Differentiate between linear, quadratic probing, and double hashing.
- Why is deletion problematic in open addressing?
- Define load factor and its impact on table resizing.