Binary Search Trees and Efficient Data Structures
Binary Trees Recap
Binary trees have nodes with at most two children: left and right.
The order of children matters (left vs. right).
Haskell declaration:
data BinaryTree a = Null | Node (BinaryTree a) a (BinaryTree a) deriving Show
Null: Represents an empty tree.Node: Corresponds to "cons" but has two recursive occurrences (left and right subtrees).Tree traversal: Visiting every node in a tree.
Preorder: Root node first.
Inorder: Root node in the middle.
Postorder: Root node last.
Determining Membership in a Tree
A simple approach involves converting the tree to a list and then using a list membership function.
However, this is inefficient as it doesn't exploit the tree's structure.
For ordered lists, we can optimize the
isElemfunction.If the current element
yis greater than the element we're looking forx, we can stop searching.
Ordered Lists: Optimizing Search
If a list is ordered, searching can be optimized.
Example: Searching for 10 in an ordered list.
If we encounter a number greater than 10 (e.g., 11), we can stop, knowing 10 isn't in the list.
Implementation: Stop recursion if
y(current element) is not strictly smaller thanx(target element).
The Problem with Lists
Lists can be arbitrarily long, making search inefficient. On average, you have to search half the list.
Trees offer logarithmic performance for lookup.
Example:
Searching a list of 100,000 elements requires, on average, examining 50,000 elements.
Searching a tree of 100,000 elements requires, on average, examining log_2(100000) \approx 15 elements.
Exponential Improvement with Trees
Tree-based search provides exponential speedup (in the mathematical sense).
The key is to maintain order within the tree.
Binary Search Trees (BST)
A binary tree where elements are sorted.
Definition: For each node, all values in its left subtree are less than or equal to the node's value, and all values in its right subtree are greater than or equal to the node's value, recursively.
In-order traversal of a BST yields a sorted list.
BST Examples
An example when BST properties is mantained:
4
/ \
2 6
/ \ / \
1 3 5 7
BST Invariants:
The tree must maintain sorted order.
Impairments of BST properties results in errors.
Searching in a Binary Search Tree
Function:
isElemBSTExploit tree properties for efficient search.
If the target element
xis equal to the current node's valuey, we're done.If
xis less thany, search the left subtree.If
xis greater thany, search the right subtree.
Haskell Implementation
isElmBST :: Ord a => a -> BinarySearchTree a -> Bool
isElmBST x Null = False
isElmBST x (Node lt y rt) =
if x == y then True
else if x < y then isElmBST x lt
else isElmBST x rt
Exponential Speedup Explained
Each comparison throws away half of the problem.
Path length from root to leaf is log(n), where n is the number of elements.
Example:
A tree with 1000 elements requires only ~10 comparisons.
Comparing this with the list needing on average 500 elements to look through.
Sets and Trees
A BST can be viewed as an implementation of a mathematical set.
isElemBSTis analogous to a set membership test.
Sets Operations:
Membership test (is element of).
Construction (inserting elements).
Constructing sets incrementally
Lists:
We can use "cons" to construct list but want to implement ordered insert function.
Trees:
Need functions to implement operations.
insert 1 into empty = Node Null 1 Null
insert 2 into above = Node Null 1 Node Null 2 Null
Ordered Insertion in Lists
Function:
orderedInsertMaintains the order of elements in the list.
If the element is a duplicate, it's dropped (set behavior).
Haskell Implementation of Ordered Insert
orderedInsert :: Ord a => a -> [a] -> [a]
orderedInsert x [] = [x]
orderedInsert x (y:ys) =
case compare x y of
EQ -> y:ys -- x is already in the list
LT -> x : y:ys -- x should be inserted before y
GT -> y : orderedInsert x ys -- x should be inserted after y
Example: inserting 1 into [2, 3, 4] should result in [1, 2, 3, 4].
Inserting 3 into [2, 5, 6] should result in [2, 3, 5, 6].
Reasoning about Haskell Code
Haskell offers equational reasoning.
You can substitute equals for equals anywhere in the code.
Example: OI(22,[10]) = 10 : OI(22, []) = 10 : [22] = [10, 22]
Limitations of Ordered Lists
Ordered insertion on average requires looking at half of the list. If you're inserting a list of million then on average the lookup requires you to look at five hundred thousand elements.
Solution - use ordered trees.
Inserting into Binary Search Trees
Function:
insertBSTMaintaining the BST ordering property during insertion is crucial.
"Dumb Insert": A Naive Approach
Simply create a new node with the new element and the existing tree as one of its children.
dumbInsert :: a -> BinarySearchTree a -> BinarySearchTree a
dumbInsert a tree = Node Null a tree
This is terrible because it does not respect the order, and it creates an unbalanced tree that looks like a list, destroying performance.
Looks up still ends up you looking at 20,000,000 records, not 24.
With one child, you are working with list like structure and exponential power is wasted.