Notes on Recursive Algorithms and Recurrence Analysis (Section 4.2)

Recursion and Time Complexity (Section 4.2)

  • Overview

    • The instructor contrasts his substitution-based approach with the textbook’s four methods for deriving running times of recursive algorithms.
    • Focus is on a loop-free recursive algorithm (no explicit for/while loops) but with multiple recursive calls that mimic looping behavior.
    • When loops exist, first analyze the loop's running time, then incorporate it into the recurrence. In recursion, the loop-free view comes from recursive calls and the merge step.
    • Example context: analyzing the running time of a merge-sort-like algorithm that operates on a subarray A[p..r] with a split at q (left A[p..q], right A[q+1..r]).
  • Core idea: represent time with a recurrence and solve it either by substitution or by a recursion-tree argument.

  • Important terminology and setup

    • Subproblem indices: p, q, r with a subarray size n = r - p + 1.
    • Left subproblem size: L = q - p + 1.
    • Right subproblem size: R = r - q.
    • Merge cost: merging the two sorted subarrays takes time proportional to their total elements, i.e. O(L + R) = O(n).
    • A key takeaway: the n here denotes the size of the subarray being processed, not the size of the whole input array A.
    • The merge step and the partitioning lead to a recurrence for the overall running time.
  • Recurrence for merge-sort style algorithm (standard form)

    • The typical recurrence is: T(n)=2T(n2)+O(n)T(n) = 2\,T\left(\frac{n}{2}\right) + O(n)
    • Base cases: T(0)=T(1)=Θ(1)T(0) = T(1) = \Theta(1) (or equivalently, constants for the smallest subproblems).
    • This recurrence captures two recursive calls on halves plus a linear-time merge step.
  • The substitution method (high-level outline)

    • Step 1: Write the recurrence, e.g. T(n)=2T(n2)+cnT(n) = 2\,T\left(\frac{n}{2}\right) + c\,n for some constant c.
    • Step 2: Guess a bound, e.g. assume T(n)c<em>1nlog</em>2nT(n) \le c<em>1\,n\,\log</em>2 n for sufficiently large n.
    • Step 3: Substitute the guess into the recurrence and simplify:
    • T(n)2T(n2)+cn2(c<em>1n2log</em>2n2)+cn=c<em>1n(log</em>2n1)+cn.T(n) \le 2\,T\left(\frac{n}{2}\right) + c\,n \le 2\left(c<em>1\frac{n}{2}\log</em>2\frac{n}{2}\right) + c\,n = c<em>1\,n\left(\log</em>2 n - 1\right) + c\,n.
    • Step 4: Choose constants (e.g., pick c1 large enough and n0 such that for all n ≥ n_0, the inequality holds). Conclude: T(n)=O(nlogn)T(n) = O(n\log n).
    • Practical note: constants often omitted in the notation; the key is the growth rate, not the exact constants.
  • The recursion-tree method (alternate constructive approach)

    • Expand the recurrence into a tree of subproblems where each node splits into two children of half the size.
    • At each level of the tree, the total work done by the non-recursive part (the merge cost) is O(n) for that level.
    • Height of the recursion tree is h=log2nh = \log_2 n (since the subproblem size halves each level).
    • Per-level cost: O(n) (summing the work of all nodes at that level).
    • Total work: sum over levels: O(n)+O(n)++O(n)O(n) + O(n) + \cdots + O(n) with h levels ⇒ O(nlogn)O(n \log n).
    • Formalizing with a parameter k (depth):
    • After k levels: T(n)=2k  T(n2k)+cnkT(n) = 2^k \;T\left(\frac{n}{2^k}\right) + c\,n\,k
    • At the bottom, when (n/2^k \le 1), we have (2^k \approx n) and T(1) = Θ(1).
    • Plugging k = (\log2 n) yields: T(n)=Θ(n)+cnlog</em>2n=Θ(nlogn).T(n) = \Theta(n) + c\,n\,\log</em>2 n = \Theta(n\log n).
  • Important consequences for the specific merge step (illustrative notes)

    • The merge operation itself runs in time O(n<em>L+n</em>R)O(n<em>L + n</em>R), where nL = q - p + 1 and nR = r - q.
    • Since n = nL + nR = r - p + 1, the merge cost is O(n)O(n) per merge.
    • The total cost across all levels sums to a linear cost per level times the number of levels (log n): O(nlogn)O(n \log n).
  • Notation and concept clarifications

    • Big-O, Big-Theta, and Big-Omega definitions (recap):
    • f(n) \in O(g(n)) \iff \exists c>0, n0: \forall n \ge n0,\ f(n) \le c\,g(n)
    • f(n) \in \Omega(g(n)) \iff \exists c>0, n0: \forall n \ge n0,\ f(n) \ge c\,g(n)
    • f(n) \in \Theta(g(n)) \iff \exists c1,c2>0, n0: \forall n \ge n0,\ c1\,g(n) \le f(n) \le c2\,g(n)
    • Relationship:
    • If f(n)O(g(n))f(n) \in O(g(n)) and f(n)Ω(g(n))f(n) \in \Omega(g(n)), then f(n)Θ(g(n))f(n) \in \Theta(g(n)).
    • Important caution in CS practice: big-O is an upper bound; big-Theta is a tight bound; some texts use equality symbols or omit certain details for simplicity, but precision matters in proofs.
    • When a problem has multiple parameters describing size (e.g., a tree with multiple dimensions), the notations extend to those parameters; base intuition remains about growth rates with respect to the dominating parameter(s).
  • The role of log base in CS notations

    • In computer science, log base 2 is common, but log base is often omitted because logarithms differ only by a constant: log<em>bn=log</em>2nlog2b\log<em>b n = \frac{\log</em>2 n}{\log_2 b}.
    • So you’ll often see simply logn\log n used; the base only changes constants, not the growth class.
  • Space (memory) considerations for merge sort (summary)

    • The merge step typically requires an auxiliary array of the size of the subarray being merged (the two subarrays together of size n).
    • This extra space can be reused at every level of recursion, so the total extra space is O(n) (for the auxiliary array) plus the recursion stack, which is O(log n).
    • Therefore, total space complexity is O(n) (dominant term from extra array) plus O(log n) for the stack, i.e. overall O(n).
    • In contrast, in-place variants exist but are more complex and may affect constants or run-time performance; the standard textbook analysis often assumes an auxiliary array of size n.
  • Practical insights and real-world relevance

    • Insertion sort vs. merge sort:
    • Worst-case time: insertion sort ⇒ O(n^2), merge sort ⇒ O(n log n).
    • For small n or favorable constants, insertion sort can be faster in practice due to lower overhead and better locality; for large n, merge sort dominates with better asymptotic scaling.
    • IO costs and hardware factors: IO can dominate time in practice, so empirical performance may diverge from pure asymptotic analysis.
    • When teaching, both time and space costs matter; in exams, time complexity (asymptotic behavior) is often emphasized; space complexity is important but sometimes left as supplementary.
  • Recap: how to approach runtime analysis (two main methods)

    • Substitution method (guaranteed bounds by guessing and proving via induction): suitable for deriving explicit upper bounds and sometimes lower bounds.
    • Recursion-tree method (visual/structural): interpret the recurrence as a tree of subproblems; sum costs level by level; gives intuition and a straightforward derivation of tight bounds.
  • Practical study tips and exam guidance

    • Expect questions similar in level to homework problems: compute time bounds, explain steps, and justify base cases.
    • If you’re unsure, write a plausible recurrence, sketch the substitution or the recursion tree, and verify by a second pass.
    • For larger programs, rely on systematic methods and proofs of correctness; intuition helps, but formal induction and bounding are essential.
    • Be mindful of exact definitions when using notation (avoid abusive uses of symbols). Distinguish between upper bounds (O), lower bounds (Ω), and tight bounds (Θ).
  • Quick glossary (CS-specific terms used here)

    • Recurrence relation: a formula defining T(n) in terms of T of smaller arguments, plus a combining cost.
    • Merge step cost: time to merge two sorted subarrays, typically O(nL + nR).
    • Height of recursion tree: number of times you can split the problem until reaching base cases, e.g. h = \log_2 n for binary splitting.
    • Depth vs. breadth: depth corresponds to recursion levels; breadth corresponds to the number of subproblems at a given level.
    • Base case: the smallest subproblems where the recursion stops; often constant time.
    • Auxiliary space: additional memory used besides the input; in merge sort, often the memory for the temporary array plus stack space.
  • Notes on proofs and rigor (calculus analogy discussed in lecture)

    • The epsilon-delta discussion for limits (e.g., limit of 1/n as n → ∞ is 0) was used to illustrate precise notions of convergence beyond intuition.
    • In CS, limits are mirrored by asymptotic bounds: show that for large n, f(n) stays within a constant factor of g(n).
    • The dialogue emphasized exactness and the perils of “abusing” symbols when teaching beginners; rigor is important, even if sometimes textbooks simplify for intuition.
  • Final reminder

    • The exam and course focus on time complexity (growth rates) and, to a lesser extent, space complexity.
    • Always relate your bound to the dominant term: for merge sort, T(n) = Θ(n log n); for merge operations within the algorithm, each merge is Θ(n); the total cost across levels is Θ(n log n).
    • When presenting solutions, show the recurrence, outline the method (substitution or tree), and conclude with the tight bound while noting base cases and assumptions.