Algorithm Analysis Comprehensive Notes
Algorithm Analysis by Steve Kreutzer, PhD
Learning Objectives
Describe empirical and mathematical frameworks for characterizing performance.
Analyze a code fragment or algorithm to determine its worst-case running time using Big-O notation.
Reasons to Analyze Algorithms
Predict performance:
Estimate how long a program will take to run.
Estimate how much extra memory a program will require.
Compare algorithms:
Provide performance guarantees.
Challenge: Measure Algorithm Running Time
Characterize the running time as a function of the input size, denoted by n.
n can represent the number of items to be processed or a single value used in computation.
Running time generally increases with problem size.
For many algorithms, running time is relatively insensitive to the characteristics of the input data.
Approaches for Evaluating Running Time
Experimental Studies:
Run the program with inputs of varying size.
Record the time needed for each input size.
Theoretical Analysis:
Analyze a high-level description of the algorithm.
Experimental Approach: Doubling Test
Execute the algorithm repeatedly, doubling the input size for each execution.
Record the time taken for each execution.
Calculate the doubling ratio (the ratio of running times for consecutive doubling input sizes).
Element Distinctness Problem
Design an algorithm to determine whether all elements of a list are distinct.
Brute Force Approach
Check every pair for equality.
Examine each value in the list and compare it to every value after it.
Leverage Sorting
Sort the list.
Compare each value to its neighbor.
If there are duplicates, they will be adjacent.
Sorting helps in many problems.
Leverage Hashing
Use a Set, which relies on hashing (constant time lookup).
Create an empty set.
For each value, determine whether it is already in the set.
If it is not in the set, add it to the set.
If it is in the set, the list is not distinct; stop.
The elements in the list are distinct if each value has been examined without finding a duplicate in the set.
Doubling Test Results (Worst Case: No Duplicates)
Algorithm 1 running time quadruples as the input size doubles.
Algorithm 2 running time doubles as the input size doubles.
Doubling Hypothesis
A quick way to estimate the exponent b in a power-law relationship.
Run the program, doubling the size of the input.
Hypothesis: Running time is approximately a \cdot n^b with b = \log_2 \text{ ratio}.
*Caveat:. Cannot identify logarithmic factors with doubling hypothesis.Caveat: Cannot identify logarithmic factors with doubling hypothesis.
Limitations of Experiments
Difficult to identify representative data.
The algorithm must be implemented beforehand.
Experiments take time and can be done only on a limited number of test inputs.
Theoretical Models for Running Time
Characterize running time by reviewing a code fragment or pseudo-code.
Develop a function f(n) to represent the running time, where n is the input size.
Examples of n: length of an array, number of records to be processed, or some value n.
Counting Operations and Statements
If the execution time of an operation or statement takes constant time, count it as 1 unit of time.
Treat execution times of statements as equivalent.
This is possible because we care about very large values of n, and as n grows very large, constants become insignificant relative to n (Asymptotic Analysis).
Constant Time vs. Not Constant Time
Constant Time:
Variable declaration
Assignment statement
Integer compare
Array element access
Array length
Integer addition
Not Constant Time:
Array allocation
String concatenation
String comparison
Loop through an array
Array copy
Function calls: count the time inside the function
Array Access Example
The number of array accesses as a function of n in the given code:
int count = 0;
for (int i = 0; i < n; i++)
for (int j = i+1; j < n; j++)
if (a[i] + a[j] == 0)
count++;
Is \frac{1}{2} n (n - 1)
Asymptotic Analysis
Characterize the running time and space requirements of an algorithm as the input size (n) becomes very large.
Let f(n) be the running time (or space) of an algorithm.
Big-O Notation
Given functions f(n) and g(n), we say that f(n) is O(g(n)) if there are positive constants c and n0 such that f(n) \leq c \cdot g(n) for n \geq n0.
Objective is to determine g(n).
Common Orders of Growth
Order of growth | Name |
|---|---|
1 | constant |
log n | logarithmic |
n | linear |
n log n | linearithmic |
n^2 | quadratic |
n^3 | cubic |
2^n | Exponential |
n! | Factorial |
Big-O Notation
Asymptotic: as n becomes very large (approaching \infty).
Worst-case analysis: Provides a performance guarantee.
Drop lower order terms and constants.
Rationale
When n is large, lower-order terms are negligible.
When n is small, we don’t care.
Function | Big-O |
|---|---|
3n^3 + 20n^2 + 5 | O(n^3) |
7n - 2 | O(n) |
3 \log n + 5 | O(\log n) |
Tilde Notation
Drop lower order terms.
Function | Tilde | Big O |
|---|---|---|
4n^5 + 20n + 16 | \sim 4n^5 | O(n^5) |
7n^2 + 100n^{4/3} + 56 | \sim 7n^2 | O(n^2) |
\frac{1}{6}n^3 - \frac{1}{2}n^2 + \frac{1}{3}n | \sim \frac{1}{6}n^3 | O(n^3) |
Example: 2-SUM
Q. Using \sim notation, how many array accesses as a function of input size n?
A. \sim n^2 array accesses.
int count = 0;
for (int i = 0; i < n; i++)
for (int j = i+1; j < n; j++)
if (a[i] + a[j] == 0)
count++;
Example: 3-SUM
Q. Using \sim notation, how many array accesses as a function of input size n?
A. \sim \frac{1}{2} n^3 array accesses.
int count = 0;
for (int i = 0; i < n; i++)
for (int j = i+1; j < n; j++)
for (int k = j+1; k < n; k++)
if (a[i] + a[j] + a[k] == 0)
count++;
Bottom line: Use cost model and asymptotic notation to simplify analysis.
Logarithms
Binary Logarithm (\log_2 n)
The power to which the number 2 must be raised to obtain the value n.
*x = \log_2 n \iff 2^x = nThe number of times you need to halve n until you have 1 or fewer left.
Logarithm of Base b (\log_b n)
The power to which b must be raised to obtain the value n.
*x = \log_b n \iff b^x = nExample: x = \log_{10} 1000 \iff 10^x = 1000 \implies x = 3
Logarithmic Growth
\log_2(n) is a very slowly growing function.
n | f(n) = log2(n) |
|---|---|
2 | 1 |
4 | 2 |
8 | 3 |
16 | 4 |
32 | 5 |
… | … |
1,048,576 | 20 |
When n doubles, \log n increases by 1.
Rapidly Growing Functions
n | f(n) = n | f(n) = n^2 | f(n) = n^3 | f(n) = 2^n | f(n) = n! |
|---|---|---|---|---|---|
1 | 1 | 1 | 1 | 2 | 1 |
2 | 2 | 4 | 8 | 4 | 2 |
3 | 3 | 9 | 27 | 8 | 6 |
… | … | … | … | … | … |
20 | 20 | 400 | 8,000 | 1,048,576 | 2.433e+18 |
Function Growth Comparison
Function | When n doubles, f(2n) = | When n increases by 1, f(n+1) = |
|---|---|---|
\log n | f(n) + 1 | |
n | 2 \cdot f(n) | |
n^2 | 4 \cdot f(n) | |
n^3 | 8 \cdot f(n) | |
2^n | 2 \cdot f(n) | |
n! | (n+1) \cdot f(n) |
Common Order of Growth Classifications Table
order of growth | name | typical code framework | description | example | T(2n) / T(n) |
|---|---|---|---|---|---|
O(1) | constant |
| statement add two numbers | 1 | |
O(log n) | logarithmic |
| divide in half binary search | ~ 1 | |
O(n) | linear |
| single loop find the maximum | 2 | |
O(n log n) | linearithmic | mergesort divide and conquer | mergesort | ~ 2 | |
O(n^2) | quadratic |
| double loop check all pairs | 4 | |
O(n^3) | cubic |
| triple loop check all triples | 8 | |
O(2^n) | exponential | Given n items, enumerate all 2n subsets | exhaustive search check all subsets | 2^n |
Practical Implications of Performance
To sort a million items:
On a machine 100 times faster
Insertion sort takes 40 minutes
Merge sort takes half a second
Insertion sort O(n^2) ~70 hours,
Merge sort O(n \log n) ~40 seconds
Approach to Analyzing Loops
Does the number of times that the loop executes depend on a constant or N?
for (int i=1; i<=100; i++)
System.out.println(i);
for (int i=1; i<=100; i+=2)
System.out.println(i);
Executes 100 times O(1)
Executes 50 times. It prints every other value. O(1)
Analyzing Loops
Does the number of times that the loop executes depend on a constant or N?
If on N, does the loop step increment/decrement by a constant or is it multiplied/divided?
for (int i=1; i<=N; i++)
System.out.println(i);
for (int i=1; i<=N; i+=2)
System.out.println(i);
Executes N times O(n)
Executes \frac{1}{2} N times. It prints every other value. O(n)
It doesn’t matter what the constant is. The loop is still O(n)
Analyzing Loops (Multiplication/Division)
Does the loop step increment/decrement by a constant, or is it multiplied/divided?
for (int i=1; i<= N; i *= 2)
System.out.print(i + " ");
Recall: \log_2 n is the power to which the number 2 must be raised to obtain the value n.
This loop executes \log_2 n times O(\log n)
Analyzing Loops (Different Base Logarithms)
Does the loop step increment/decrement by a constant, or is it multiplied/divided?
for (int i=1; i<= N; i *= 10)
System.out.print(i + " ");
Recall: \log_{10} n is the power to which 10 must be raised to obtain the value n.
This loop executes \log_{10} n times.
Logarithms of different bases differ by a constant, so ignore the base for Big-O.
O(\log n)
Analyzing Code Fragments with Big-O Notation
Distinguish between statements that count as one step versus N steps.
N represents the size of the problem, such as array length or a variable N.
Statements that may not count as one step: loops, function call bodies, string operations.
Consider single loops: Does the loop depend on a constant or N?
Yes: the loop takes constant time.
No: determine whether the variable that controls the loop
Increases or decreases by a constant (loop is O(N)).
Multiplied or divided by a constant (loop is O(log N)).
Analyzing Nested Loops with Big-O Notation
Determine how many times the innermost statement is executed. *If the loop variables for all of the loops depend on N and increase by a constant, the running time is O(x * n^c)
where c is the number of nested loops and x is the execution time of the innermost statement.
Example:
for (int i=0; i<N; i++)
for (int j=0; j<N; j++)
for (int k=0; k<N; k++)
System.out.println(count++);
for (int i=0; i<N; i++)
for (int j=i+1; j<N; j++)
for (int k=j+1; k<N; k++)
System.out.println(count++);
O(n^3)
Analyzing Code Fragments with Big-O Notation (Non-Constant Increment)
If one of the loop variables does not increase/decrease by a constant, total up the sum of the number of times the innermost loop is executed:
for (int i=1; i<=N; i*=2) {
for (int j=1; j<=i; j++)
System.out.print(j + " ");
System.out.println();
}
1 + 2 + 4 + 8 + … + N (Geometric Series)
2N - 1
O(N)
Useful Math
Sum of Numbers from 1 to n (Arithmetic Progression)
Formula to calculate 1 + 2 + 3 + … + (n-2) + (n-1) + n
*\frac{1}{2} * n * (n+1)
Geometric Series
1 + 2 + 4 + 8 + … + N = 2*n - 1
Doubling Test Calculations
To determine how the running time changes when N doubles, calculate \frac{f(2n)}{f(n)}.
To determine how the running time changes when N increases by 1, calculate \frac{f(n+1)}{f(n)}.
If f(n) = n^2, then \frac{f(2n)}{f(n)} = \frac{(2n)^2}{n^2} = \frac{4n^2}{n^2} = 4
Hence, when n doubles, the running time quadruples.
If f(n) = n^3, then \frac{f(2n)}{f(n)} = \frac{(2n)^3}{n^3} = \frac{8n^3}{n^3} = 8
Hence, when n doubles, the running time increases by a factor of 8.
If f(n) = 2^n, then \frac{f(n+1)}{f(n)} = \frac{2^{n+1}}{2^n} = 2
Hence, when n increases by 1, the running time doubles.
Code Analysis Questions
Q-1: What is the order of growth for the running time?
for (int i=1; i<= N; i++)
System.out.println(i);
Answer: C. O(n)
Q-1: What is the order of growth for the running time?
for (int i = 0; i < N; i++)
for (int j = 0; j < N; j++)
System.out.println(i+10 + j);
Answer: D. O(n^2)
Q-1: What is the order of growth for the running time?
for (int i=1; i<= N; i++)
System.out.println(i);
for (int j=1; j<= N; h++)
++count;
Answer: C. O(n)
Q-1: What is the order of growth for the running time?
for (int i = 0; i < N; i = 1*2)
System.out.println(i);
Answer: B. O(log n)
Q-1: What is the order of growth for the running time?
int count = 0;
for (int i = 0; i < n; i++) for (int j=i+1; j < n; j++) for (int k = 1; k <= n; k = k*2) if (a[i] +a[j] >= a[k])
count++;
Answer: D. O(n^2 log n)
Questions about the Code
(i) How many times is the if statement block starting at line 4 executed in terms of n in the worst case? (ii) How many times is the code inside the if statement block starting at line 4 executed in terms of n in the best case? (iii) What is the maximum number of array accesses in terms of n in the worst case? (iv) Use tilde notation to represent the worst-case number of array accesses based on your answer in (iii). (v) Use Big-O notation to represent the order of growth in terms of n, based on the answer in (iv).
Assume a is an integer array and n > 0.
(i) n(n-1)/2 (ii) 0 (iii)6 * n(n-1)/2 = 3n(n-1) = 3n^2 – 3 (iv)~3n^2 (v) O(n^2)
Here are some functions in java with their corresponding worst-case complexities:
Function Worst Case Time
swap O(1)
f0 O(n^2)
f2 O(n)
f3 O(N^2)
f4 O(1)
f5 O(N)
f7 O(log N)
f8 O(log N)
f8b O(n log N
f11 O(log N)