Lecture 03 - Searching and Sorting Algorithms (Data Structures)
Built-in ADTs in Languages
Modern languages provide efficient, well-tested implementations of common data structures in standard libraries.
As professional programmers, you rarely implement these from scratch, but understanding how they work "under the hood" helps:
Choose the right data structure for your needs
Recognize and avoid performance pitfalls
Build your own when built-ins fall short
Analogy: learning how an engine works makes you a better driver, mechanic, and problem solver, even if you don’t hand-build an engine.
Language Core Library vs ADT Examples
Python: Core Library includes list, set, dict
C++: STL (Standard Template Library) includes vector, list, stack, queue, deque, set, map
Java: Java Collections Framework (JCF) includes Collection, Set, List, Map, Queue, Deque
Searching and Sorting: Overview
Searching algorithm: a method to locate a specific item/value within a data collection or defined space.
Basic use cases: search for values (e.g., a number in an array, a key in a dictionary) using techniques like linear search, binary search, or hash lookups.
Beyond the list: searching can mean exploring a solution space (e.g., maze paths, puzzles, function optimization) where each position is a candidate solution.
Algorithmic goal: efficiently reduce the number of possibilities by exploiting structure (ordering, heuristics, probability) to find a valid or optimal solution within acceptable time/space bounds.
Linear Search
Definition: sequential scan through all elements.
Pseudocode (conceptual):
LinearSearch(values, key):
i = 0
for each value in values:
if value == key: return i
else: i = i + 1
return -1
Complexity:
Best case:
Average case:
Worst case:
Binary Search
Prerequisite: data must be sorted.
Principle: each step halves the search space.
Pseudocode (iterative):
BinarySearch(values, key):
low = 0
high = size(values) - 1
while high >= low:
mid = (high - low) / 2 + low
if values[mid] < key: low = mid + 1
else if values[mid] > key: high = mid - 1
else: return mid
return -1
Complexity:
Best case:
Average case:
Worst case:
Note: O(\, ext{log} \, N)
Binary Search (Recursive)
Recursive version: BinarySearch(values, key, low, high)
if high < low: return -1
mid = (high - low) / 2 + low
if values[mid] < key: return BinarySearch(values, key, mid + 1, high)
else if values[mid] > key: return BinarySearch(values, key, low, mid - 1)
else: return mid
Call: BinarySearch(values, key, 0, size(values) - 1)
Other Search Techniques (as listed in the slides)
Interpolation Search: heuristic that uses the value distribution to guess where the key might be (details not spelled out in the transcript).
Jump Search: jump ahead by fixed-size blocks and then linear search within the block (details not specified in the transcript).
Exponential Search: expands a range exponentially to locate a block where the key may reside (details not specified in the transcript).
Note: The transcript explicitly labels these strategies but provides little to no complexity details beyond names.
Sorting: Divide and Conquer and Recursion Concepts
Theme: many sorting algorithms use divide and conquer, often via recursion, then combine sub-solutions.
Visual metaphor on the slides highlights building up from smaller pieces (e.g., cutting, merging, or partitioning treasure maps).
Core steps (Divide and Conquer):
Divide the problem into smaller subproblems of the same type (ideally roughly equal in size).
Conquer the pieces recursively, down to a simple base case.
Combine the sub-solutions to form the overall solution.
Efficiency comes from reducing problem size at each divide step.
Insertion Sort
Builds a sorted list one item at a time.
It is the only quadratic runtime sort shown as a practical, non-educational example in production contexts (per slide).
Pseudocode concept:
for i = 1 to array.size - 1:
j = i
while j > 0 and array[j] < array[j - 1]:
swap array[j] with array[j - 1]
j = j - 1
Complexity:
Best case: O(N)O(N^2)O(N^2)T(n) = 2T(n/2) + O(n)O(n \, log\, n) in all cases
Key operation: Merge two sorted halves in linear time.
Merge procedure (conceptual):
Merge(left, right): merge two sorted subarrays into a single sorted array.
Example structure: while leftPos <= leftEnd and rightPos <= rightEnd: compare and copy the smaller element; then copy any remaining elements from either side.
Finally copy merged elements back into the original array.
Master Theorem application (Merge Sort): a = 2, b = 2, f(n) = Θ(n) ⇒ Case 2 ⇒ T(n) = Θ(n log n).
Recursion tree visualization: cost per level doubles subproblems while total work per level remains Θ(n), yielding Θ(n log n) total.
Quicksort
Core idea: choose a pivot, partition the array so that values less than the pivot go left and values greater go right.
Recursively sort the left and right partitions.
No separate merge step; ordering emerges from partitioning.
Pseudocode (partition):
Partition(array, low, high):
mid = low + (high - low) / 2
pivot = array[mid]
done = false
while not done:
while array[low] < pivot: low = low + 1
while pivot < array[high]: high = high - 1
if low >= high: done = true
else: swap array[low] with array[high]
low = low + 1
high = high - 1
return high // last index of low partition
Quicksort function:
Quicksort(array, low, high):
if low >= high: return
endLow = Partition(array, low, high)
Quicksort(array, low, endLow)
Quicksort(array, endLow + 1, high)
Complexity (average/best vs worst):
Best/Average case: O(n \, log n)O(n^2)T(n) = O(n \, log n)T(n) = O(n^2)O(n \, log n)O(n^2)O(n \, log n)O(n \, log n)O(n)O(\,log n)O(n \log_{10}(k))O(n)O(n^2)O(n^2)O(n^2)O(n^2)O(n^2)O(n)O(n^2)O(n^2)O(n \, ext{log} n)^*O(n^{1.3})O(n^2)O(n \, log n)O(n)O(n \, log n)O(n^2)O(n \, log n); Stable: No; In-place: Yes. Notes: based on heap; good performance but not stable.
Notes on Master Theorem and Recurrence Relations
Recurrence form for divide-and-conquer algorithms:
T(n) = a · T(n/b) + f(n)
a: number of subproblems
b: factor by which the problem size is reduced
f(n): time to combine the sub-solutions (the work outside recursive calls)
Master Theorem applicability:
Used to solve recurrences of the form T(n) = a T(n/b) + f(n).
Compute n^{log_b a} and compare with f(n) to determine the case.
The three classical cases (informal):
Case 1: f(n) = O(n^{log_b a - ε}) for some ε > 0
Then T(n) = Θ(n^{log_b a})
Case 2: f(n) = Θ(n^{log_b a} log^k n) for some k ≥ 0
Then T(n) = Θ(n^{log_b a} log^{k+1} n)
Case 3: f(n) = Ω(n^{log_b a + ε}) for some ε > 0, and regularity condition holds (often written as a f(n/b) ≤ c f(n) for some c < 1 and sufficiently large n)
Then T(n) = Θ(f(n))
How to apply Master Theorem (workflow):
Identify a, b, f(n) in the recurrence.
Compute n^{log_b a}.
Compare f(n) with n^{log_b a} (and any log factors) to select the correct case.
Conclude T(n) from the case.
Classic examples:
Merge Sort: T(n) = 2 T(n/2) + Θ(n) → a = 2, b = 2, f(n) = Θ(n) → Case 2 → T(n) = Θ(n log n).
Binary Search: T(n) = T(n/2) + Θ(1) → a = 1, b = 2, f(n) = Θ(1) → Case 2 → T(n) = Θ(log n).
Recurrence Examples and Master Theorem Flow
Master Theorem Flowchart (conceptual):
If f(n) = O(n^{logb a - ε}) for some ε > 0 → Case 1 → T(n) = Θ(n^{logb a})
Else if f(n) = Θ(n^{logb a} log^k n) → Case 2 → T(n) = Θ(n^{logb a} log^{k+1} n)
Else if f(n) = Ω(n^{log_b a + ε}) and regularity condition holds → Case 3 → T(n) = Θ(f(n))
Else Master Theorem does not apply
Master Theorem: Worked Case Examples (conceptual)
Case 1 example outline: T(n) = 2 T(n/2) + Θ(1)
a = 2, b = 2, f(n) = Θ(1) → n^{logb a} = n^{log2 2} = n^1 = n
Since f(n) = O(n^{1 - ε}) with ε = 1, Case 1 applies? (Note: the standard interpretation yields T(n) = Θ(n), but the slide framing is context-dependent; the key takeaway is using the comparison with n^{log_b a}.)
Case 2 example outline: T(n) = 4 T(n/2) + Θ(n^2)
a = 4, b = 2, f(n) = Θ(n^2) → n^{logb a} = n^{log2 4} = n^{2}
f(n) = Θ(n^{logb a}) → Case 2 with k = 0 → T(n) = Θ(n^{logb a} log n) = Θ(n^2 log n) (as per the general form). (The slide uses a variant; the essential point is that equality with a log factor escalates to Case 2 in the Master Theorem framework.)
Case 3 example outline: T(n) = 2 T(n/2) + Θ(n^2)
a = 2, b = 2, f(n) = Θ(n^2) → n^{log_b a} = n
Since f(n) grows faster than n^{log_b a}, Case 3 applies given the regularity condition, yielding T(n) = Θ(f(n)) = Θ(n^2).
Final notes: The Master Theorem provides a structured way to deduce asymptotic growth for divide-and-conquer recurrences, with Merge Sort and Binary Search as canonical examples.
Practical Takeaways
When choosing a sorting algorithm, consider:
Input size and characteristics (random, nearly sorted, linked lists, in-memory arrays).
Stability requirements.
Memory constraints (in-place vs extra space).
Worst-case vs average-case performance.
Radix Sort offers a non-comparison-based alternative with predictable linear passes in certain contexts, particularly with fixed-length keys, but requires extra space and handling of signs. Overall complexity scales with the number of digits of the largest key: O(n \, d) = O(n \, ext{log}_{b}(k))bkO(\,)
abla $$ (log base 2) denotation in context, and variations by base as needed.Recurrences: T(n) = a T(n/b) + f(n)
Master Theorem cases summarized above.