Binary Search Trees and Efficient Data Structures

Binary Trees Recap

Binary trees have nodes with at most two children: left and right.
The order of children matters (left vs. right).
Haskell declaration:

  data BinaryTree a = Null | Node (BinaryTree a) a (BinaryTree a) deriving Show

Null: Represents an empty tree.
Node: Corresponds to "cons" but has two recursive occurrences (left and right subtrees).
- Tree traversal: Visiting every node in a tree.
Preorder: Root node first.
Inorder: Root node in the middle.
Postorder: Root node last.

Determining Membership in a Tree

A simple approach involves converting the tree to a list and then using a list membership function.
However, this is inefficient as it doesn't exploit the tree's structure.
For ordered lists, we can optimize the isElem function.
If the current element y is greater than the element we're looking for x, we can stop searching.

Ordered Lists: Optimizing Search

If a list is ordered, searching can be optimized.
Example: Searching for 10 in an ordered list.
If we encounter a number greater than 10 (e.g., 11), we can stop, knowing 10 isn't in the list.
Implementation: Stop recursion if y (current element) is not strictly smaller than x (target element).

The Problem with Lists

Lists can be arbitrarily long, making search inefficient. On average, you have to search half the list.
Trees offer logarithmic performance for lookup.
Example:
- Searching a list of 100,000 elements requires, on average, examining 50,000 elements.
- Searching a tree of 100,000 elements requires, on average, examining $log_2(100000) \approx 15$ elements.

Exponential Improvement with Trees

Tree-based search provides exponential speedup (in the mathematical sense).
The key is to maintain order within the tree.

Binary Search Trees (BST)

A binary tree where elements are sorted.
Definition: For each node, all values in its left subtree are less than or equal to the node's value, and all values in its right subtree are greater than or equal to the node's value, recursively.
In-order traversal of a BST yields a sorted list.

BST Examples

An example when BST properties is mantained:

        4
       / \
      2   6
     / \ / \
    1   3 5   7

BST Invariants:

The tree must maintain sorted order.
Impairments of BST properties results in errors.

Searching in a Binary Search Tree

Function: isElemBST
Exploit tree properties for efficient search.
If the target element x is equal to the current node's value y, we're done.
If x is less than y, search the left subtree.
If x is greater than y, search the right subtree.

Haskell Implementation

isElmBST :: Ord a => a -> BinarySearchTree a -> Bool
isElmBST x Null = False
isElmBST x (Node lt y rt) =
  if x == y then True
  else if x < y then isElmBST x lt
  else isElmBST x rt

Exponential Speedup Explained

Each comparison throws away half of the problem.
Path length from root to leaf is $log(n)$ , where $n$ is the number of elements.
Example:
- A tree with 1000 elements requires only ~10 comparisons.
- Comparing this with the list needing on average 500 elements to look through.

Sets and Trees

A BST can be viewed as an implementation of a mathematical set.
isElemBST is analogous to a set membership test.

Sets Operations:

Membership test (is element of).
Construction (inserting elements).

Constructing sets incrementally

Lists:
- We can use "cons" to construct list but want to implement ordered insert function.
Trees:
- Need functions to implement operations.

  insert 1 into empty =  Node Null 1 Null
  insert 2 into above = Node Null 1 Node Null 2 Null

Ordered Insertion in Lists

Function: orderedInsert
Maintains the order of elements in the list.
If the element is a duplicate, it's dropped (set behavior).

Haskell Implementation of Ordered Insert

orderedInsert :: Ord a => a -> [a] -> [a]
orderedInsert x [] = [x]
orderedInsert x (y:ys) =
  case compare x y of
    EQ -> y:ys  -- x is already in the list
    LT -> x : y:ys -- x should be inserted before y
    GT -> y : orderedInsert x ys -- x should be inserted after y

Example: inserting 1 into [2, 3, 4] should result in [1, 2, 3, 4].

Inserting 3 into [2, 5, 6] should result in [2, 3, 5, 6].

Reasoning about Haskell Code

Haskell offers equational reasoning.
You can substitute equals for equals anywhere in the code.
Example: $OI(22,[10]) = 10 : OI(22, []) = 10 : [22] = [10, 22]$

Limitations of Ordered Lists

Ordered insertion on average requires looking at half of the list. If you're inserting a list of million then on average the lookup requires you to look at five hundred thousand elements.
Solution - use ordered trees.

Inserting into Binary Search Trees

Function: insertBST
Maintaining the BST ordering property during insertion is crucial.

"Dumb Insert": A Naive Approach

Simply create a new node with the new element and the existing tree as one of its children.

dumbInsert :: a -> BinarySearchTree a -> BinarySearchTree a
dumbInsert a tree = Node Null a tree

This is terrible because it does not respect the order, and it creates an unbalanced tree that looks like a list, destroying performance.
- Looks up still ends up you looking at 20,000,000 records, not 24.
With one child, you are working with list like structure and exponential power is wasted.