Hashing

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/22

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

23 Terms

1
New cards

Hashing Function

Converts a key (often a string) to an integer

Integer is used for indexing a storage array

  • The same deterministic function is used for storing and retrieving data

Index integer is not just restricted to 0 to 25

  • Hence, data can be spread across a much longer array

2
New cards

How do we Deal with Collisions?

chaining

3
New cards

Hash Table

data structure that is designed to be fast to work with

4
New cards

Hash Function

takes the key of an element to generate a hash code

maps keys to array indices

5
New cards

Hash Code

says what bucket the element belongs to, so we can now find it assuming no collisions

6
New cards

Buckets

array indices

where the elements are stored

7
New cards

What is the Unique Part of an Element?

the key

8
New cards

How does Hashing Work? (diagram)

knowt flashcard image
9
New cards

Increasing Range of Indexes

  • For more objects (more keys), need longer array to minimise collisions

  • Hashing function needs to produce a large range of integers

  • Use more information from each key

    • e.g. all letters, and perhaps length

  • But, just adding these together, a 15-letter key will rarely give an index, say, over 300

10
New cards

Designing a Good Hashing Function

  • minimise collisions

  • get a wide range of index values

11
New cards

Minimising Collisions

  • a wide range of index values and

  • unform hashing (each key is equally likely to map to an array location)

12
New cards

Getting a Wide Range of Index Values

use multiplication as well as addition

13
New cards

Dealing with Large Indices

If we want to use an array of length size, we can use: 

  • index = rawIndex%size: 

where % is the integer remainder operation (modulo)

  • e.g. to store 50 or so words, choose size = 100: 

    • key = "pace", rawIndex = 71671 

      • index = 71671%100 = 71 

    • key = "cart", rawIndex = 345438 

      • index = 345438%100 = 38

14
New cards

The Hashing Pitfall

Note that hash retrieval has two stages: 

  1. Use hashing function to compute index

  2. Search chain in array element to see if there is a match 

With a good hashing function and a large array, Stage 2 can be close to O(1), but, Stage 1 should also be fast

15
New cards

Real-World Hashing Functions…

  • avoid slow multiplication and division operations 

  • use fast logical shift and ‘OR’ operations

16
New cards

Multiplication vs Bit Shifting

  • Multiplication is a slow operation

  • Where the multiplier is a power of 2, we can left shift the bits in the multiplicand (X) by the power of 2 instead, and get exactly the same answer

<ul><li><p>Multiplication is a slow operation</p></li><li><p>Where the multiplier is a power of 2, we can left shift the bits in the multiplicand (X) by the power of 2 instead, and get exactly the same answer</p></li></ul><p></p>
17
New cards

Why do we use Key-Value Pairs?

to distinguish between key-value pairs with the same hash

18
New cards

Key

unique input to hash function that is translated by hash function

19
New cards

Value

what we put in the bucket associated with the key

20
New cards

Hash Table ADT

  • Stores key-value pairs

  • Keys are unique and a handle to the associated value

  • Operations

    • put(t, k) stores k in the table t

    • get(t, k) returns if k is in t else nil

    • delete(t, k) removes k from t

21
New cards

Hash Table ADT Pseudocode

//chains are lists; A is hash table array

put(t, <k,v>) {
	chain = t.A[hash(k)]
	//C doesn't have to be sorted
	add(chain, <k,v>)
}

get(t, k) {
	chain = t.A[hash(k)]
	return search(chain,k)
}

delete(t, k) {
	chain = t.A[hash(k)]
	remove(chain,k)
}

22
New cards

What Happens if Chains Grow too Long

  • Hash tables become less efficient

23
New cards

Avoiding Chains Growing too Long

hash table allocates double the size of current array

then puts each in current array into larger array

expensive operation

invisible to the user of the ADT