CS2040S Lecture: Hashing

Hash Tables
  • A hash table is a data structure that implements an associative array, a structure that can map keys to values.
  • It uses a hash function to compute an index into an array of buckets or slots.
Symbol Table
  • Definition: An abstract data type that supports insertion, deletion, and search operations.
  • Operations:
    • void insert(Key k, Value v): Insert a key-value pair into the table.
    • Value search(Key k): Retrieve the value associated with a key.
    • void delete(Key k): Remove a key-value pair from the table.
    • boolean contains(Key k): Check if a key exists.
    • int size(): Return the size of the table.
Examples of Symbol Tables
  • Dictionary
    • Key: word; Value: definition.
  • Phone Book
    • Key: name; Value: phone number.
  • Internet DNS
    • Key: website URL; Value: IP address.
  • Java Compiler
    • Key: variable name; Value: type and value.
Implementing Symbol Tables
  • AVL Tree Based: Different costs for insertions (CI) and searches (CS) can be achieved depending on the algorithm.
    • Different combinations: CI = O(1), O(log n), O(n) can affect CS.
  • Fast implementations aim for CI = O(1) and CS = O(1).
Issues with Symbol Tables
  • Sorting
    • Cannot efficiently sort data with a symbol table.
    • For AVL trees, sorting runs in O(n log n), while symbol tables could take O(n²).
  • Efficiency
    • Direct access tables are manageable only with integer keys. Non-integer keys can lead to inefficient data storage.
Collision Management in Hash Tables
  • Collision: Occurs when two distinct keys hash to the same index.
    • If h(k1) = h(k2), a collision occurs for keys k1 and k2.
  • Coping Strategies:
    • Chaining: Store colliding elements in a linked list within the same bucket.
    • Open Addressing: Find another bucket for the new item.
Hash Functions
  • Definition: A method to convert a key into an index (bucket).
  • Assumptions:
    • Hash functions should uniformly distribute keys across the buckets to minimize collisions.
    • Each key is equally likely to map to any bucket.
Performance Analysis of Hash Tables
  • Expected Search Time: O(1 + n/m) where n is the number of items and m is the number of buckets.
  • With a good hash function and enough buckets, the expected maximum cost for searching becomes logarithmic, O(log n).
Applications of Hash Tables
  • Fast data retrieval applications such as database indexing, caching, and set memberships.
  • Used extensively in programming languages and system operations due to their efficiency in handling searches and insertions.