Design and Analysis of Algorithms: Randomized Problems, Complexity Classes, and Approximation

The Hiring Problem: Definition and Procedure

  • Scenario Overview:   - You need to hire a new office assistant through an employment agency.   - The agency sends one candidate per day.   - After interviewing a candidate, you must decide immediately whether to hire them or not.   - The Requirement: You are committed to always having the best possible person for the job. Therefore, if a new applicant is better qualified than the current assistant, you must fire the current one and hire the new one immediately.   

  • Cost Structure:   - Interview Cost (cic_i): A small fee paid to the agency to interview an applicant.   - Hiring Cost (chc_h): A large fee paid to the agency, which includes the cost of firing the current assistant.   - Total cost is dominated by the number of times you hire, as hiring is significantly more expensive than interviewing.

  • HIRE-ASSISTANT(n) Pseudocode:   1. best = 0 (Candidate 0 is a least-qualified dummy candidate used for initialization)   2. for i = 1 to n   3. do interview candidate i   4. if candidate i is better than candidate best   5. then best = i   6. hire candidate i

Cost Analysis of the Hiring Problem

  • Total Cost Equation:   - Let nn be the number of candidates interviewed.   - Let mm be the number of people actually hired.   - The total cost associated with the algorithm is O(nci+mch)O(nc_i + mc_h).   - Because nn candidates are always interviewed, the term ncinc_i is constant across runs. Analysis focuses on the variable quantity mchm c_h.

  • Worst-Case Analysis:   - The worst case occurs when candidates arrive in strictly increasing order of quality.   - In this scenario, every candidate interviewed is better than the previous, leading to nn hires.   - Total hiring cost: O(nch)O(n c_h).

  • Average-Case / Probabilistic Analysis:   - Assumption: Candidates arrive in a random order. Each of the n!n! permutations of candidates is equally likely.   - Indicator Random Variables: To simplify the calculation of the expected number of hires (E[X]E[X]), we define nn indicator random variables XiX_i.   - Xi=Icandidate iextishiredX_i = I{\text{candidate } i ext{ is hired}}   - Xi={1amp;if candidate i is hired 0amp;if candidate i is not hiredX_i = \begin{cases} 1 & \text{if candidate } i \text{ is hired} \ 0 & \text{if candidate } i \text{ is not hired} \end{cases}   - Candidate ii is hired if and only if they are better than all candidates from 11 to i1i - 1.   - Given a random order, any of the first ii candidates is equally likely to be the best so far. Thus, the probability of hiring candidate ii is P(Xi=1)=1iP(X_i = 1) = \frac{1}{i}.   - By Lemma 5.1, E[Xi]=1iE[X_i] = \frac{1}{i}.

  • Expected Number of Hires Calculation:   - E[X]=E[i=1nXi]E[X] = E[\sum_{i=1}^{n} X_i]   - By linearity of expectation: E[X]=i=1nE[Xi]=i=1n1iE[X] = \sum_{i=1}^{n} E[X_i] = \sum_{i=1}^{n} \frac{1}{i}   - This sum is the nthn\text{th} harmonic number, denoted HnH_n, where Hn=ln(n)+O(1)H_n = \ln(n) + O(1).   - Conclusion (Lemma 5.2): Assuming random order, the expected total hiring cost is O(chln(n))O(c_h \ln(n)).

Randomized Quicksort

  • Standard Quicksort Algorithm:   1. Pick a pivot pp from the array (if size > 1).   2. Partition the array into three sub-arrays: LESS (elements < pp), EQUAL (elements = pp), and GREATER (elements > pp).   3. Recursively apply the algorithm to LESS and GREATER.

  • Comparison of Variants:   - Basic-Quicksort: Always picks the leftmost element as the pivot.     - Best Case: O(nlog(n))O(n \log(n))     - Worst Case: O(n2)O(n^2)     - Average Case: O(nlog(n))O(n \log(n))   - Randomized-Quicksort: Picks a random element as the pivot for each partition step.     - Goal: Achieve a worst-case expected-time bound of O(nlog(n))O(n \log(n)).

  • Expected Running Time Analysis:   - Let eie_i be the ithi\text{th} smallest element and eje_j be the jthj\text{th} smallest (e_i < e_j).   - Case 1: If a pivot pp is chosen such that e_i < p < e_j, eie_i and eje_j are separated into different sub-arrays and will never be compared.   - Case 2: If p=eip = e_i or p=ejp = e_j, they are compared exactly once.   - Case 3: If p < e_i or p > e_j, both stay in the same sub-array for the next round.   - Probability that eie_i and eje_j are compared: 2ji+1\frac{2}{j - i + 1}.   - Total expected comparisons (XijX_{ij} being the indicator that eie_i and eje_j are compared):     - E[X]=i=1nj=i+1nE[Xij]=i=1nj=i+1n2ji+1E[X] = \sum_{i=1}^{n} \sum_{j=i+1}^{n} E[X_{ij}] = \sum_{i=1}^{n} \sum_{j=i+1}^{n} \frac{2}{j - i + 1}     - This simplifies to E[X]=2i=1nk=2ni+11kE[X] = 2 \sum_{i=1}^{n} \sum_{k=2}^{n-i+1} \frac{1}{k}.     - Since the harmonic number Hn[ln(n),1+ln(n)]H_n \in [\ln(n), 1 + \ln(n)], the total expectation is E[X] < 2n(H_n - 1) < 2n \ln(n).

Complexity Classes: P, NP, NP-Hard, and NP-Complete

  • Class P (Polynomial Time):   - Problems solvable in O(p(n))O(p(n)) on a deterministic machine, where p(n)p(n) is a polynomial.   - Examples: Fractional Knapsack, Minimum Spanning Tree (MST), Sorting.

  • Decision vs. Optimization Problems:   - Decision Problem: Requires a "Yes" or "No" answer.   - Optimization Problem: Requires finding the best possible solution (e.g., the shortest path).   - Example (TSP):     - Optimization: Find the shortest tour length.     - Decision: Is there a tour with length C\le C?

  • Class NP (Nondeterministic Polynomial-time):   - Problems solvable in polynomial time on a nondeterministic machine (capable of "guessing" the right answer).   - Equivalently: Problems whose solutions can be verified in polynomial time on a deterministic machine.   - Examples: Graph Coloring, Traveling Salesman Problem (TSP), Satisfiability (SAT).

  • Subset Sum Problem:   - Given integers a1aNa_1 \dots a_N and target KK, decide if a subset sums to KK.   - Example: Set 5,7,3,9,1{5, 7, 3, 9, 1}, Target 1212. Answer: YES (via 5,7{5, 7}).

  • NP-Hard and NP-Complete:   - NP-Hard: A problem is NP-Hard if every problem in NP is polynomial-time reducible to it (RpQR \le_p Q).   - NP-Complete (NPC): A problem is NPC if it is both NP-Hard and in NP.   - If an NP-Hard problem is reducible to another problem, that second problem is also NP-Hard.

Relationship Between Complexity Classes

  • Known Truths:   - PNPP \subseteq NP   - NPCNPNPC \subseteq NP   - NPCNPHardNPC \subseteq NP-Hard

  • Open Questions / Not Known:   - Does P=NPP = NP? (If so, P=NP=NPCP = NP = NPC).   - If PNPP \neq NP, then the intersection of PP and NPCNPC is empty, and the intersection of PP and NPHardNP-Hard is empty.

Proving NP-Completeness

  • Standard Procedure for Problem L:   1. Prove LNPL \in NP (the verification step).   2. Prove LL is NP-Hard by reducing a known NPC problem to LL in polynomial time.

  • The Cook-Levin Theorem:   - Proved that Satisfiability (SAT) is NP-Complete. This serves as the foundation for the complexity hierarchy.

  • Reduction Transitivity Chain:   - Every NP problem \rightarrow SAT \rightarrow 3-SAT \rightarrow CLIQUE \rightarrow Independent Set (INDSET) \rightarrow Vertex Cover (VERTEX-COVER).

Definitions of Specific NPC Problems

  • Satisfiability (SAT):   - Given a Boolean formula in Conjunctive Normal Form (CNF) (conjunction of clauses, where clauses are disjunctions of literals), is there a truth assignment that makes the formula TRUE?   - 3-SAT: Each clause must contain exactly 3 literals.

  • CLIQUE:   - A clique is a subset of vertices in an undirected graph G=(V,E)G = (V, E) where every vertex is connected to every other vertex.   - Decision: Does GG contain a clique of size kk?   - Verification: Provide the nodes as a certificate; check nodes connect to all others in the subset.

  • Independent Set (INDSET):   - A subset of vertices where no two vertices share an edge.   - Decision: Does GG have an independent set of size kk?

  • Vertex Cover (VERTCOV):   - A subset of vertices such that every edge in the graph is incident to at least one vertex in the set.   - Decision: Does GG have a vertex cover of size kk?

  • Reduction Logic:   - CLIQUE to INDSET: A clique in graph GG is an independent set in the complement graph Gˉ\bar{G}.   - INDSET to VERTCOV: A set VV' is an independent set of size kk iff VVV \setminus V' is a vertex cover of size nkn - k.

Approximation Algorithms

  • Motivation: When facing an NP-Hard problem where an optimal polynomial-time solution is impossible (unless P=NPP = NP), one must sacrifice one of three goals:   1. Solve to optimality.   2. Solve in polynomial time.   3. Solve arbitrary instances.

  • Definition of ρ\rho-approximation: An algorithm that is guaranteed to run in polynomial time for arbitrary instances and returns a solution within a ratio ρ\rho of the true optimum (OPTOPT).

  • Vertex Cover 2-approximation Algorithm:   1. C = empty set   2. E' = G.E (copy of edges)   3. while E' is not empty:   4. pick an arbitrary edge {u, v} from E'   5. add u and v to C   6. remove every edge from E' that is incident to either u or v   7. return C

  • Proof of Approximation Ratio:   - Let AA be the set of edges chosen in line 4.   - No two edges in AA share a vertex (due to line 6).   - To cover the edges in AA, any optimal solution C<em>C^<em> must contain at least one unique vertex for each edge in AA. Thus, AC</em>|A| \le |C^</em>|.   - The algorithm adds two vertices for each edge in AA, so C=2A|C| = 2|A|.   - Therefore, C2C|C| \le 2|C^*|, making it a 2-approximation algorithm.