Binary Search Trees and Efficient Data Structures

Binary Trees Recap

  • Binary trees have nodes with at most two children: left and right.

  • The order of children matters (left vs. right).

  • Haskell declaration:

  data BinaryTree a = Null | Node (BinaryTree a) a (BinaryTree a) deriving Show
  • Null: Represents an empty tree.

  • Node: Corresponds to "cons" but has two recursive occurrences (left and right subtrees).

    • Tree traversal: Visiting every node in a tree.

  • Preorder: Root node first.

  • Inorder: Root node in the middle.

  • Postorder: Root node last.

Determining Membership in a Tree

  • A simple approach involves converting the tree to a list and then using a list membership function.

  • However, this is inefficient as it doesn't exploit the tree's structure.

  • For ordered lists, we can optimize the isElem function.

  • If the current element y is greater than the element we're looking for x, we can stop searching.

Ordered Lists: Optimizing Search

  • If a list is ordered, searching can be optimized.

  • Example: Searching for 10 in an ordered list.

  • If we encounter a number greater than 10 (e.g., 11), we can stop, knowing 10 isn't in the list.

  • Implementation: Stop recursion if y (current element) is not strictly smaller than x (target element).

The Problem with Lists

  • Lists can be arbitrarily long, making search inefficient. On average, you have to search half the list.

  • Trees offer logarithmic performance for lookup.

  • Example:

    • Searching a list of 100,000 elements requires, on average, examining 50,000 elements.

    • Searching a tree of 100,000 elements requires, on average, examining log_2(100000) \approx 15 elements.

Exponential Improvement with Trees

  • Tree-based search provides exponential speedup (in the mathematical sense).

  • The key is to maintain order within the tree.

Binary Search Trees (BST)

  • A binary tree where elements are sorted.

  • Definition: For each node, all values in its left subtree are less than or equal to the node's value, and all values in its right subtree are greater than or equal to the node's value, recursively.

  • In-order traversal of a BST yields a sorted list.

BST Examples

  • An example when BST properties is mantained:

        4
       / \
      2   6
     / \ / \
    1   3 5   7

BST Invariants:

  • The tree must maintain sorted order.

  • Impairments of BST properties results in errors.

Searching in a Binary Search Tree

  • Function: isElemBST

  • Exploit tree properties for efficient search.

  • If the target element x is equal to the current node's value y, we're done.

  • If x is less than y, search the left subtree.

  • If x is greater than y, search the right subtree.

Haskell Implementation

isElmBST :: Ord a => a -> BinarySearchTree a -> Bool
isElmBST x Null = False
isElmBST x (Node lt y rt) =
  if x == y then True
  else if x < y then isElmBST x lt
  else isElmBST x rt

Exponential Speedup Explained

  • Each comparison throws away half of the problem.

  • Path length from root to leaf is log(n), where n is the number of elements.

  • Example:

    • A tree with 1000 elements requires only ~10 comparisons.

    • Comparing this with the list needing on average 500 elements to look through.

Sets and Trees

  • A BST can be viewed as an implementation of a mathematical set.

  • isElemBST is analogous to a set membership test.

Sets Operations:

  • Membership test (is element of).

  • Construction (inserting elements).

Constructing sets incrementally

  • Lists:

    • We can use "cons" to construct list but want to implement ordered insert function.

  • Trees:

    • Need functions to implement operations.

  insert 1 into empty =  Node Null 1 Null
  insert 2 into above = Node Null 1 Node Null 2 Null

Ordered Insertion in Lists

  • Function: orderedInsert

  • Maintains the order of elements in the list.

  • If the element is a duplicate, it's dropped (set behavior).

Haskell Implementation of Ordered Insert

orderedInsert :: Ord a => a -> [a] -> [a]
orderedInsert x [] = [x]
orderedInsert x (y:ys) =
  case compare x y of
    EQ -> y:ys  -- x is already in the list
    LT -> x : y:ys -- x should be inserted before y
    GT -> y : orderedInsert x ys -- x should be inserted after y

Example: inserting 1 into [2, 3, 4] should result in [1, 2, 3, 4].

  • Inserting 3 into [2, 5, 6] should result in [2, 3, 5, 6].

Reasoning about Haskell Code

  • Haskell offers equational reasoning.

  • You can substitute equals for equals anywhere in the code.

  • Example: OI(22,[10]) = 10 : OI(22, []) = 10 : [22] = [10, 22]

Limitations of Ordered Lists

  • Ordered insertion on average requires looking at half of the list. If you're inserting a list of million then on average the lookup requires you to look at five hundred thousand elements.

  • Solution - use ordered trees.

Inserting into Binary Search Trees

  • Function: insertBST

  • Maintaining the BST ordering property during insertion is crucial.

"Dumb Insert": A Naive Approach

  • Simply create a new node with the new element and the existing tree as one of its children.

dumbInsert :: a -> BinarySearchTree a -> BinarySearchTree a
dumbInsert a tree = Node Null a tree
  • This is terrible because it does not respect the order, and it creates an unbalanced tree that looks like a list, destroying performance.

    • Looks up still ends up you looking at 20,000,000 records, not 24.

  • With one child, you are working with list like structure and exponential power is wasted.