Design and Analysis of Algorithms: Randomized Problems, Complexity Classes, and Approximation
The Hiring Problem: Definition and Procedure
Scenario Overview: - You need to hire a new office assistant through an employment agency. - The agency sends one candidate per day. - After interviewing a candidate, you must decide immediately whether to hire them or not. - The Requirement: You are committed to always having the best possible person for the job. Therefore, if a new applicant is better qualified than the current assistant, you must fire the current one and hire the new one immediately.
Cost Structure: - Interview Cost (): A small fee paid to the agency to interview an applicant. - Hiring Cost (): A large fee paid to the agency, which includes the cost of firing the current assistant. - Total cost is dominated by the number of times you hire, as hiring is significantly more expensive than interviewing.
HIRE-ASSISTANT(n) Pseudocode: 1.
best = 0(Candidate 0 is a least-qualified dummy candidate used for initialization) 2.for i = 1 to n3.do interview candidate i4.if candidate i is better than candidate best5.then best = i6.hire candidate i
Cost Analysis of the Hiring Problem
Total Cost Equation: - Let be the number of candidates interviewed. - Let be the number of people actually hired. - The total cost associated with the algorithm is . - Because candidates are always interviewed, the term is constant across runs. Analysis focuses on the variable quantity .
Worst-Case Analysis: - The worst case occurs when candidates arrive in strictly increasing order of quality. - In this scenario, every candidate interviewed is better than the previous, leading to hires. - Total hiring cost: .
Average-Case / Probabilistic Analysis: - Assumption: Candidates arrive in a random order. Each of the permutations of candidates is equally likely. - Indicator Random Variables: To simplify the calculation of the expected number of hires (), we define indicator random variables . - - - Candidate is hired if and only if they are better than all candidates from to . - Given a random order, any of the first candidates is equally likely to be the best so far. Thus, the probability of hiring candidate is . - By Lemma 5.1, .
Expected Number of Hires Calculation: - - By linearity of expectation: - This sum is the harmonic number, denoted , where . - Conclusion (Lemma 5.2): Assuming random order, the expected total hiring cost is .
Randomized Quicksort
Standard Quicksort Algorithm: 1. Pick a pivot from the array (if size > 1). 2. Partition the array into three sub-arrays:
LESS(elements < ),EQUAL(elements = ), andGREATER(elements > ). 3. Recursively apply the algorithm toLESSandGREATER.Comparison of Variants: - Basic-Quicksort: Always picks the leftmost element as the pivot. - Best Case: - Worst Case: - Average Case: - Randomized-Quicksort: Picks a random element as the pivot for each partition step. - Goal: Achieve a worst-case expected-time bound of .
Expected Running Time Analysis: - Let be the smallest element and be the smallest (e_i < e_j). - Case 1: If a pivot is chosen such that e_i < p < e_j, and are separated into different sub-arrays and will never be compared. - Case 2: If or , they are compared exactly once. - Case 3: If p < e_i or p > e_j, both stay in the same sub-array for the next round. - Probability that and are compared: . - Total expected comparisons ( being the indicator that and are compared): - - This simplifies to . - Since the harmonic number , the total expectation is E[X] < 2n(H_n - 1) < 2n \ln(n).
Complexity Classes: P, NP, NP-Hard, and NP-Complete
Class P (Polynomial Time): - Problems solvable in on a deterministic machine, where is a polynomial. - Examples: Fractional Knapsack, Minimum Spanning Tree (MST), Sorting.
Decision vs. Optimization Problems: - Decision Problem: Requires a "Yes" or "No" answer. - Optimization Problem: Requires finding the best possible solution (e.g., the shortest path). - Example (TSP): - Optimization: Find the shortest tour length. - Decision: Is there a tour with length ?
Class NP (Nondeterministic Polynomial-time): - Problems solvable in polynomial time on a nondeterministic machine (capable of "guessing" the right answer). - Equivalently: Problems whose solutions can be verified in polynomial time on a deterministic machine. - Examples: Graph Coloring, Traveling Salesman Problem (TSP), Satisfiability (SAT).
Subset Sum Problem: - Given integers and target , decide if a subset sums to . - Example: Set , Target . Answer: YES (via ).
NP-Hard and NP-Complete: - NP-Hard: A problem is NP-Hard if every problem in NP is polynomial-time reducible to it (). - NP-Complete (NPC): A problem is NPC if it is both NP-Hard and in NP. - If an NP-Hard problem is reducible to another problem, that second problem is also NP-Hard.
Relationship Between Complexity Classes
Known Truths: - - -
Open Questions / Not Known: - Does ? (If so, ). - If , then the intersection of and is empty, and the intersection of and is empty.
Proving NP-Completeness
Standard Procedure for Problem L: 1. Prove (the verification step). 2. Prove is NP-Hard by reducing a known NPC problem to in polynomial time.
The Cook-Levin Theorem: - Proved that Satisfiability (SAT) is NP-Complete. This serves as the foundation for the complexity hierarchy.
Reduction Transitivity Chain: - Every NP problem SAT 3-SAT CLIQUE Independent Set (INDSET) Vertex Cover (VERTEX-COVER).
Definitions of Specific NPC Problems
Satisfiability (SAT): - Given a Boolean formula in Conjunctive Normal Form (CNF) (conjunction of clauses, where clauses are disjunctions of literals), is there a truth assignment that makes the formula TRUE? - 3-SAT: Each clause must contain exactly 3 literals.
CLIQUE: - A clique is a subset of vertices in an undirected graph where every vertex is connected to every other vertex. - Decision: Does contain a clique of size ? - Verification: Provide the nodes as a certificate; check nodes connect to all others in the subset.
Independent Set (INDSET): - A subset of vertices where no two vertices share an edge. - Decision: Does have an independent set of size ?
Vertex Cover (VERTCOV): - A subset of vertices such that every edge in the graph is incident to at least one vertex in the set. - Decision: Does have a vertex cover of size ?
Reduction Logic: - CLIQUE to INDSET: A clique in graph is an independent set in the complement graph . - INDSET to VERTCOV: A set is an independent set of size iff is a vertex cover of size .
Approximation Algorithms
Motivation: When facing an NP-Hard problem where an optimal polynomial-time solution is impossible (unless ), one must sacrifice one of three goals: 1. Solve to optimality. 2. Solve in polynomial time. 3. Solve arbitrary instances.
Definition of -approximation: An algorithm that is guaranteed to run in polynomial time for arbitrary instances and returns a solution within a ratio of the true optimum ().
Vertex Cover 2-approximation Algorithm: 1.
C = empty set2.E' = G.E(copy of edges) 3.while E' is not empty:4.pick an arbitrary edge {u, v} from E'5.add u and v to C6.remove every edge from E' that is incident to either u or v7.return CProof of Approximation Ratio: - Let be the set of edges chosen in line 4. - No two edges in share a vertex (due to line 6). - To cover the edges in , any optimal solution must contain at least one unique vertex for each edge in . Thus, . - The algorithm adds two vertices for each edge in , so . - Therefore, , making it a 2-approximation algorithm.