Algorithm Analysis Comprehensive Notes

Algorithm Analysis by Steve Kreutzer, PhD

Learning Objectives

  • Describe empirical and mathematical frameworks for characterizing performance.

  • Analyze a code fragment or algorithm to determine its worst-case running time using Big-O notation.

Reasons to Analyze Algorithms

  • Predict performance:

    • Estimate how long a program will take to run.

    • Estimate how much extra memory a program will require.

  • Compare algorithms:

    • Provide performance guarantees.

Challenge: Measure Algorithm Running Time

  • Characterize the running time as a function of the input size, denoted by n.

  • n can represent the number of items to be processed or a single value used in computation.

  • Running time generally increases with problem size.

  • For many algorithms, running time is relatively insensitive to the characteristics of the input data.

Approaches for Evaluating Running Time

  • Experimental Studies:

    • Run the program with inputs of varying size.

    • Record the time needed for each input size.

  • Theoretical Analysis:

    • Analyze a high-level description of the algorithm.

Experimental Approach: Doubling Test

  • Execute the algorithm repeatedly, doubling the input size for each execution.

  • Record the time taken for each execution.

  • Calculate the doubling ratio (the ratio of running times for consecutive doubling input sizes).

Element Distinctness Problem

  • Design an algorithm to determine whether all elements of a list are distinct.

Brute Force Approach
  • Check every pair for equality.

  • Examine each value in the list and compare it to every value after it.

Leverage Sorting
  • Sort the list.

  • Compare each value to its neighbor.

  • If there are duplicates, they will be adjacent.

  • Sorting helps in many problems.

Leverage Hashing
  • Use a Set, which relies on hashing (constant time lookup).

    1. Create an empty set.

    2. For each value, determine whether it is already in the set.

      • If it is not in the set, add it to the set.

      • If it is in the set, the list is not distinct; stop.

    3. The elements in the list are distinct if each value has been examined without finding a duplicate in the set.

Doubling Test Results (Worst Case: No Duplicates)

  • Algorithm 1 running time quadruples as the input size doubles.

  • Algorithm 2 running time doubles as the input size doubles.

Doubling Hypothesis

  • A quick way to estimate the exponent b in a power-law relationship.

  • Run the program, doubling the size of the input.

  • Hypothesis: Running time is approximately a \cdot n^b with b = \log_2 \text{ ratio}.
    *Caveat:. Cannot identify logarithmic factors with doubling hypothesis.

  • Caveat: Cannot identify logarithmic factors with doubling hypothesis.

Limitations of Experiments

  • Difficult to identify representative data.

  • The algorithm must be implemented beforehand.

  • Experiments take time and can be done only on a limited number of test inputs.

Theoretical Models for Running Time

  • Characterize running time by reviewing a code fragment or pseudo-code.

  • Develop a function f(n) to represent the running time, where n is the input size.

    • Examples of n: length of an array, number of records to be processed, or some value n.

Counting Operations and Statements

  • If the execution time of an operation or statement takes constant time, count it as 1 unit of time.

  • Treat execution times of statements as equivalent.

  • This is possible because we care about very large values of n, and as n grows very large, constants become insignificant relative to n (Asymptotic Analysis).

Constant Time vs. Not Constant Time

  • Constant Time:

    • Variable declaration

    • Assignment statement

    • Integer compare

    • Array element access

    • Array length

    • Integer addition

  • Not Constant Time:

    • Array allocation

    • String concatenation

    • String comparison

    • Loop through an array

    • Array copy

    • Function calls: count the time inside the function

Array Access Example

  • The number of array accesses as a function of n in the given code:

int count = 0;
for (int i = 0; i < n; i++)
 for (int j = i+1; j < n; j++)
 if (a[i] + a[j] == 0)
 count++;
  • Is \frac{1}{2} n (n - 1)

Asymptotic Analysis

  • Characterize the running time and space requirements of an algorithm as the input size (n) becomes very large.

  • Let f(n) be the running time (or space) of an algorithm.

Big-O Notation
  • Given functions f(n) and g(n), we say that f(n) is O(g(n)) if there are positive constants c and n0 such that f(n) \leq c \cdot g(n) for n \geq n0.

  • Objective is to determine g(n).

Common Orders of Growth

Order of growth

Name

1

constant

log n

logarithmic

n

linear

n log n

linearithmic

n^2

quadratic

n^3

cubic

2^n

Exponential

n!

Factorial

Big-O Notation

  • Asymptotic: as n becomes very large (approaching \infty).

  • Worst-case analysis: Provides a performance guarantee.

  • Drop lower order terms and constants.

Rationale
  • When n is large, lower-order terms are negligible.

  • When n is small, we don’t care.

Function

Big-O

3n^3 + 20n^2 + 5

O(n^3)

7n - 2

O(n)

3 \log n + 5

O(\log n)

Tilde Notation

  • Drop lower order terms.

Function

Tilde

Big O

4n^5 + 20n + 16

\sim 4n^5

O(n^5)

7n^2 + 100n^{4/3} + 56

\sim 7n^2

O(n^2)

\frac{1}{6}n^3 - \frac{1}{2}n^2 + \frac{1}{3}n

\sim \frac{1}{6}n^3

O(n^3)

Example: 2-SUM

Q. Using \sim notation, how many array accesses as a function of input size n?
A. \sim n^2 array accesses.

int count = 0;
for (int i = 0; i < n; i++)
 for (int j = i+1; j < n; j++)
 if (a[i] + a[j] == 0)
 count++;

Example: 3-SUM

Q. Using \sim notation, how many array accesses as a function of input size n?
A. \sim \frac{1}{2} n^3 array accesses.

int count = 0;
for (int i = 0; i < n; i++)
 for (int j = i+1; j < n; j++)
 for (int k = j+1; k < n; k++)
 if (a[i] + a[j] + a[k] == 0)
 count++;

Bottom line: Use cost model and asymptotic notation to simplify analysis.

Logarithms

Binary Logarithm (\log_2 n)
  • The power to which the number 2 must be raised to obtain the value n.
    *x = \log_2 n \iff 2^x = n

  • The number of times you need to halve n until you have 1 or fewer left.

Logarithm of Base b (\log_b n)
  • The power to which b must be raised to obtain the value n.
    *x = \log_b n \iff b^x = n

  • Example: x = \log_{10} 1000 \iff 10^x = 1000 \implies x = 3

Logarithmic Growth

  • \log_2(n) is a very slowly growing function.

n

f(n) = log2(n)

2

1

4

2

8

3

16

4

32

5

1,048,576

20

  • When n doubles, \log n increases by 1.

Rapidly Growing Functions

n

f(n) = n

f(n) = n^2

f(n) = n^3

f(n) = 2^n

f(n) = n!

1

1

1

1

2

1

2

2

4

8

4

2

3

3

9

27

8

6

20

20

400

8,000

1,048,576

2.433e+18

Function Growth Comparison

Function

When n doubles, f(2n) =

When n increases by 1, f(n+1) =

\log n

f(n) + 1

n

2 \cdot f(n)

n^2

4 \cdot f(n)

n^3

8 \cdot f(n)

2^n

2 \cdot f(n)

n!

(n+1) \cdot f(n)

Common Order of Growth Classifications Table

order of growth

name

typical code framework

description

example

T(2n) / T(n)

O(1)

constant

a = b + c;

statement add two numbers

1

O(log n)

logarithmic

while (n > 1) { n = n/2; ... }

divide in half binary search

~ 1

O(n)

linear

for (int i = 0; i < n; i++) { ... }

single loop find the maximum

2

O(n log n)

linearithmic

mergesort divide and conquer

mergesort

~ 2

O(n^2)

quadratic

for (int i = 0; i < n; i++) for (int j = 0; j < n; j++) { ... }

double loop check all pairs

4

O(n^3)

cubic

for (int i = 0; i < n; i++) for (int j = 0; j < n; j++) for (int k = 0; k < n; k++) { ... }

triple loop check all triples

8

O(2^n)

exponential

Given n items, enumerate all 2n subsets

exhaustive search check all subsets

2^n

Practical Implications of Performance

  • To sort a million items:

    • On a machine 100 times faster

    • Insertion sort takes 40 minutes

    • Merge sort takes half a second

  • Insertion sort O(n^2) ~70 hours,

  • Merge sort O(n \log n) ~40 seconds

Approach to Analyzing Loops

  • Does the number of times that the loop executes depend on a constant or N?

for (int i=1; i<=100; i++)
 System.out.println(i);

for (int i=1; i<=100; i+=2)
 System.out.println(i);
  • Executes 100 times O(1)

  • Executes 50 times. It prints every other value. O(1)

Analyzing Loops

  • Does the number of times that the loop executes depend on a constant or N?

  • If on N, does the loop step increment/decrement by a constant or is it multiplied/divided?

for (int i=1; i<=N; i++)
 System.out.println(i);

for (int i=1; i<=N; i+=2)
 System.out.println(i);
  • Executes N times O(n)

  • Executes \frac{1}{2} N times. It prints every other value. O(n)

  • It doesn’t matter what the constant is. The loop is still O(n)

Analyzing Loops (Multiplication/Division)

  • Does the loop step increment/decrement by a constant, or is it multiplied/divided?

for (int i=1; i<= N; i *= 2)
 System.out.print(i + " ");
  • Recall: \log_2 n is the power to which the number 2 must be raised to obtain the value n.

  • This loop executes \log_2 n times O(\log n)

Analyzing Loops (Different Base Logarithms)

  • Does the loop step increment/decrement by a constant, or is it multiplied/divided?

for (int i=1; i<= N; i *= 10)
 System.out.print(i + " ");
  • Recall: \log_{10} n is the power to which 10 must be raised to obtain the value n.

  • This loop executes \log_{10} n times.

  • Logarithms of different bases differ by a constant, so ignore the base for Big-O.

  • O(\log n)

Analyzing Code Fragments with Big-O Notation

  • Distinguish between statements that count as one step versus N steps.

  • N represents the size of the problem, such as array length or a variable N.

  • Statements that may not count as one step: loops, function call bodies, string operations.

  • Consider single loops: Does the loop depend on a constant or N?

    • Yes: the loop takes constant time.

    • No: determine whether the variable that controls the loop

      • Increases or decreases by a constant (loop is O(N)).

      • Multiplied or divided by a constant (loop is O(log N)).

Analyzing Nested Loops with Big-O Notation

  • Determine how many times the innermost statement is executed. *If the loop variables for all of the loops depend on N and increase by a constant, the running time is O(x * n^c)

    • where c is the number of nested loops and x is the execution time of the innermost statement.
      Example:

for (int i=0; i<N; i++)
 for (int j=0; j<N; j++)
 for (int k=0; k<N; k++)
 System.out.println(count++);

for (int i=0; i<N; i++)
 for (int j=i+1; j<N; j++)
 for (int k=j+1; k<N; k++)
 System.out.println(count++);
  • O(n^3)

Analyzing Code Fragments with Big-O Notation (Non-Constant Increment)

  • If one of the loop variables does not increase/decrease by a constant, total up the sum of the number of times the innermost loop is executed:

for (int i=1; i<=N; i*=2) {
 for (int j=1; j<=i; j++)
 System.out.print(j + " ");
 System.out.println();
}
  • 1 + 2 + 4 + 8 + … + N (Geometric Series)

  • 2N - 1

  • O(N)

Useful Math

Sum of Numbers from 1 to n (Arithmetic Progression)
  • Formula to calculate 1 + 2 + 3 + … + (n-2) + (n-1) + n
    *\frac{1}{2} * n * (n+1)

Geometric Series

1 + 2 + 4 + 8 + … + N = 2*n - 1

Doubling Test Calculations

  • To determine how the running time changes when N doubles, calculate \frac{f(2n)}{f(n)}.

  • To determine how the running time changes when N increases by 1, calculate \frac{f(n+1)}{f(n)}.
    If f(n) = n^2, then \frac{f(2n)}{f(n)} = \frac{(2n)^2}{n^2} = \frac{4n^2}{n^2} = 4
    Hence, when n doubles, the running time quadruples.
    If f(n) = n^3, then \frac{f(2n)}{f(n)} = \frac{(2n)^3}{n^3} = \frac{8n^3}{n^3} = 8
    Hence, when n doubles, the running time increases by a factor of 8.
    If f(n) = 2^n, then \frac{f(n+1)}{f(n)} = \frac{2^{n+1}}{2^n} = 2
    Hence, when n increases by 1, the running time doubles.

Code Analysis Questions

Q-1: What is the order of growth for the running time?
for (int i=1; i<= N; i++)
System.out.println(i);
Answer: C. O(n)

Q-1: What is the order of growth for the running time?
for (int i = 0; i < N; i++)
for (int j = 0; j < N; j++)
System.out.println(i+10 + j);
Answer: D. O(n^2)

Q-1: What is the order of growth for the running time?
for (int i=1; i<= N; i++)
System.out.println(i);
for (int j=1; j<= N; h++)
++count;
Answer: C. O(n)

Q-1: What is the order of growth for the running time?
for (int i = 0; i < N; i = 1*2)
System.out.println(i);
Answer: B. O(log n)

Q-1: What is the order of growth for the running time?
int count = 0;
for (int i = 0; i < n; i++) for (int j=i+1; j < n; j++) for (int k = 1; k <= n; k = k*2) if (a[i] +a[j] >= a[k])
count++;
Answer: D. O(n^2 log n)

Questions about the Code
(i) How many times is the if statement block starting at line 4 executed in terms of n in the worst case? (ii) How many times is the code inside the if statement block starting at line 4 executed in terms of n in the best case? (iii) What is the maximum number of array accesses in terms of n in the worst case? (iv) Use tilde notation to represent the worst-case number of array accesses based on your answer in (iii). (v) Use Big-O notation to represent the order of growth in terms of n, based on the answer in (iv).
Assume a is an integer array and n > 0.
(i) n(n-1)/2 (ii) 0 (iii)6 * n(n-1)/2 = 3n(n-1) = 3n^2 – 3 (iv)~3n^2 (v) O(n^2)

Here are some functions in java with their corresponding worst-case complexities:

Function Worst Case Time
swap O(1)
f0 O(n^2)
f2 O(n)
f3 O(N^2)
f4 O(1)
f5 O(N)
f7 O(log N)
f8 O(log N)
f8b O(n log N
f11 O(log N)