Looks like no one added any tags here yet for you.
Searching Problem/Index System
Set of entries: {𝑒1, … , 𝑒𝑛}
Each entry 𝑒𝑖 is a pair (𝑘𝑖, 𝑣𝑖), where:
𝑘𝑖 is a key value from some totally ordered domain, say integers or strings,
𝑣𝑖 is an associated data value
We assume keys are unique
We do not use data values to solve the searching problem, but they will be important to the applications that use our data structure
Searching problem: Given an arbitrary search key 𝑘𝑘, determine if there exists an entry matching this key value and if so, what the associated data value is
Example of Index System
DNS lookup
Insert domain name with specified IP address to obtain the set of (domain name, IP address) entries
Given domain name, find the corresponding IP address
Dictionary (Map) ADT
An ADT with operations insert, delete and find:
void insert(Key 𝒌, Value 𝒗): Stores an entry with the key-value pair (𝑘, 𝑣). (If 𝑘 is already present in the dictionary, error)
void delete(Key k): Deletes the entry associated with key 𝑘 from the dictionary. (If it does not exist, error)
Value find(Key k): Determine whether there is an entry in the dictionary associated with key 𝑘. If there is, return the associated value. Otherwise, return null reference
You can also support iteration over the entries, range queries (enumerate or count all entries between some minimum and maximum value), returning 𝒊th smallest value as well as union and intersection (of dictionaries)
Common implementations of dictionary ADT use:
Sorted arrays
Search trees
Hashing
Sequential Allocation for Dictionary ADT
Most naïve implementation: Store the entries in a linear array without sorting
To find key value, linear search -> 𝑂(𝑛) time
Insertion only takes 𝑂(1) time, if we still have enough space in the array. But checking for duplicates -> 𝑂(𝑛) time
Otherwise, use a sorted (by key value) array
To find key value, binary search -> 𝑂(log 𝑛) time
For 𝑛 = 10^9 entries, log2 𝑛 ≤ 30
But updates (insertion and deletion) take 𝑶(𝒏) time as elements in array must be moved around to make space
Binary Search Tree
Setting Up Binary Search Tree ADT
implementing Find
Time Complexity of Find
Implementing Insertion
Need to maintain sortedness:
If x is null, insert new node
If x has key, give error,
Else if key < x’s key, recurse left
Else recurse right
Insertion Order Matters
Time Complexity of Insertion
Time Complexity:
For a BST of height ℎ, worst-case time complexity = 𝑂(ℎ)
Depends on the insertion order
How can we avoid inefficient search?
If insertion order is (key) random, then (expected) BST height is 𝑂𝑂(log 𝑛𝑛),
Average-case time complexity: 𝑶(𝐥og 𝒏)
Otherwise, implement more complex insertion operations to “balance” BST height at 𝑂(log 𝑛) for all possible insertion orders
Self-Balancing BSTs
Implementing Deletion
Code for delete()
Beyond Simple Binary Search Trees
All operations on a BST of height ℎ have a worst-case time complexity of 𝑂(ℎ)
But that will also hold for non-binary search trees
Where you have 𝑖 ≤ 𝑚ax keys per tree node and 𝑖 + 1 children
The first two subtrees contain, respectively, nodes whose keys are smaller and greater than the first key
The second and third subtree …. whose keys are smaller and greater than the second key
To make these operations more efficient, use self-balancing search trees
Self-Balancing Search Trees
There are many options, with diverse trade-offs
2-3 trees: tertiary self-balancing search trees
AVL trees: self-balancing BST
Red-black trees are a (BST) generalization of 2-3 trees
B-trees: generalization of 2-3 trees to more keys per node
Some are self-balancing but not height-balanced:
Splay Trees
Treaps
For example, you may want your search tree to self-balance for common letter (key) patterns rather than for any letter (key) sequence
2-3 Trees
2-nodes: 1 key, 2 children
3-nodes: 2 keys, 3 children
(Natural variant of) Inorder traversal gives keys in ascending order
Perfectly balanced tree: Every path from the root to a leaf has the same length
How do we maintain this property despite insertions and deletions?
Find (2-3 Trees)
Works (almost) just as in BSTs,
Example: find(8)
Insertion (2-3 Trees)
Cons of 2-3 Trees
However, 2-3 trees are quite inconvenient to implement
Different types of nodes (2-nodes and 3-nodes)
We need to do multiple compares at each node,
We need to move back up the tree to split 4-nodes 4. Large number of cases for that splitting
Red-black trees are BSTs that build upon 2-3 trees, representing 3-node as a binary tree node with a red left link