Detailed Notes on Hashing and Hash Tables

Chapter Objectives

  • Define hashing
  • Examine various hashing functions
  • Examine the problem of collisions in hash tables
  • Explore the Java Collections API implementations of hashing

Introduction to Hashing

  • Previous discussion focused on binary search trees as an efficient way to implement sets/maps.
  • Hashing provides another efficient approach which can outperform search trees.

20.1 Hashing

  • When implementing collections, order can be determined by:
    • Addition/Removal Order: Used in stacks, queues, and lists.
    • Value Comparison: Used in ordered lists and search trees.
  • Hashing determines the location of an item via a hash function, storing elements in a hash table.
  • Cells/Buckets: The locations in the hash table where elements are stored.
  • Operations using hashing are expected to be O(1) due to direct computation of locations.

Hash Function Basics

  • A simple example uses the first letter of names to map them to an index.
  • Key Concepts:
    • Hash Table: Storage structure that uses a hashing function to determine the position of entries.
    • Collision: Occurs when two keys map to the same location in a hash table.
    • Perfect Hashing Function: A function that maps every key to a unique location in the table.

Hash Table Size Considerations

  • Table size decisions depend on data set size:
    • Use the same size as the data set if perfect hashing is possible.
    • If not perfect but data size is known, a common approach is to set the table to 150% of the data size.
    • If data size is unknown, dynamic resizing is utilized by creating a larger table and rehashing existing elements.

20.2 Hashing Functions

  • Not all hashing functions need to be perfect; a good function results in O(1) access.
  • Some common methods to create a hashing function:
    • Extraction: Uses part of a value (e.g., the first letter of a string).
    • Division Method: Uses the remainder of the key divided by a number p to get an index.
    • Defined as: Hashcode(key)=Math.abs(key)modpHashcode(key) = Math.abs(key) \bmod p
    • Using prime numbers for p generally leads to better key distribution.
    • Folding Method: Combines parts of the key to form an index.
    • Mid-Square Method: Squares the key and extracts middle digits for the index.
    • Radix Transformation Method: Converts key to another numeric base and applies the division method.
    • Digit Analysis Method: Uses specific digits from a key to form the index, allowing manipulation techniques (like reversal).
    • Length-Dependent Method: Combines key length and value for indexing.

20.3 Resolving Collisions

  • When perfect mapping isn't achievable, collision resolution methods become necessary:
    • Chaining: Each cell in the hash table points to a collection (like a list). It can be executed through a larger table with overflow storage or linked lists.
    • Open Addressing: Seeks another open position in the table on collision.
    • Linear Probing: Looks for the next available slot sequentially.
    • Quadratic Probing: Uses a formula to determine the next slot based on a quadratic sequence.
    • Double Hashing: Uses a secondary hash function for determining new slot positions during a collision.

20.4 Deleting Elements from A Hash Table

  • Deletion handling depends on the collision resolution method:
    • Chaining: Direct removal from lists at each table cell.
    • Open Addressing: Marking elements as deleted rather than physically removing them to preserve search capability.

20.5 Hash Tables in the Java Collections API

  • Java Collections API has several implementations of hashing:
    • Hashtable: Oldest implementation, synchronized, uses chaining.
    • HashMap: Not synchronized but more commonly used, also uses chaining.
    • HashSet: Implements Set interface; uses chaining, doesn’t guarantee order.
    • IdentityHashMap: Compares using reference-equality.
    • WeakHashMap: Automatically removes entries when keys are no longer used; supports null keys/values.
    • LinkedHashSet and LinkedHashMap: Maintain insertion order through a doubly linked list strategy.
  • Load Factor: The threshold for resizing tables, default is 0.75.

Summary of Key Concepts

  • Hashing stores elements in a hash table determined by a hashing function.
  • Collisions occur when keys map to the same location.
  • Effective hashing functions ensure efficient distribution and recovery from collisions.
  • Java provides several implementations of hash tables in its Collections API, which are important for practical applications.

Self-Review Questions

  1. What distinguishes a hash table from other collections?
  2. Define collision in the context of hash tables.
  3. Describe a perfect hashing function.
  4. What is the goal of a hashing function?
  5. What are the implications of poor hashing?
  6. Explain the extraction method in hashing.
  7. Define the division method.
  8. How does shift folding work?
  9. What distinguishes boundary folding?
  10. Explain the mid-square method.
  11. Describe the radix transformation method.
  12. What does the digit analysis method entail?
  13. Explain the length-dependent method.
  14. What is chaining in the context of collision resolution?
  15. Describe open addressing.
  16. Differentiate between linear, quadratic probing, and double hashing.
  17. Why is deletion problematic in open addressing?
  18. Define load factor and its impact on table resizing.