Algorithm Design and Analysis: Graph Theory and Implementation

Graph Basic Concepts and Representation

  • Definition of a Graph: A graph G=(V,E)G = (V, E) consists of a set of vertices VV and a set of edges EE.
  • Edge Constraints: The number of edges is bounded by the square of the vertices: |E| {

Directed and Undirected Graphs

  • Undirected Graphs:     * An edge (u,v)E(u, v) \in E implies that edge (v,u)E(v, u) \in E also exists.     * Example: Road networks between cities.
  • Directed Graphs (Digraphs):     * An edge (u,v)E(u, v) \in E does not imply edge (v,u)E(v, u) \in E.     * Example: Street networks in a downtown area with one-way streets.
  • Self-loops: Self-loop edges (an edge where the source and destination are the same vertex) are possible only in directed graphs according to [CLRS] (Figures 22.1 and 22.2).

Connectivity and Structural Properties

  • Adjacency and Incidence: When an edge connects two vertices, the vertices are said to be adjacent to one another, and the edge is said to be incident on both vertices.
  • Degree of a Vertex:     * Undirected Graphs: The degree is the number of edges adjacent to vertex vv.     * Directed Graphs: Measured by in-degree (edges coming into the vertex) and out-degree (edges leaving the vertex). Total degree is the sum of both.
  • Weighted Graphs: Each edge has an associated numerical value known as a weight.
  • Connectivity:     * Undirected: A graph is connected if a path exists between any two vertices.     * Directed: A graph is strongly connected if there is a directed path between any two vertices in both directions.
  • Density:     * Dense Graphs: The number of edges is close to the maximum possible (V2|V|^2).     * Sparse Graphs: The number of edges is small (there is no clear numerical threshold provided, but it implies EV2|E| \ll |V|^2).

Data Structures for Graph Representation

  • Adjacency Matrix:     * Vertices are numbered V={1,2,,n}V = \{1, 2, \dots, n\}.     * An n×nn \times n matrix AA represents the graph.     * A[i,j]=1A[i, j] = 1 if edge (i,j)E(i, j) \in E, and 00 otherwise.     * Weighted Graphs: A[i,j]=wijA[i, j] = w_{ij} if the edge exists.     * Undirected Graphs: The matrix is symmetric (A[i,j]=A[j,i]A[i, j] = A[j, i]).     * Performance:         * Time to check if an edge exists between uu and vv: Θ(1)\Theta(1).         * Memory usage: Θ(n2)\Theta(n^2) regardless of edge count.         * Suitability: Efficient for small or dense graphs, but usually takes too much storage for large graphs.
  • Adjacency List:     * For each vertex vVv \in V, a list of all vertices adjacent to vv is stored.     * Weighted Graphs: Each vertex item in the list also stores the weight value.     * Memory Usage:         * Directed graphs: Total items out-degree(v)=E\sum \text{out-degree}(v) = |E|.         * Undirected graphs: Total items degree(v)=2E\sum \text{degree}(v) = 2|E|         * Overall space complexity: Θ(V+E)\Theta(V + E).     * Performance: Time to test if an edge (u,v)E(u, v) \in E exists is O(E)O(E).

Simple Undirected Unweighted Graphs

  • Path Concepts:     * Path: A sequence of vertices connected by edges with no repeated edges.     * Simple Path: A path with no repeated vertices.     * Cycle: A path where the first and last vertices are the same (must have at least one edge).     * Simple Cycle: A cycle with no repeated vertices except the starting/ending vertex.
  • Subgraph Concepts:     * Connected Components: Maximal connected subgraphs.     * Acyclic Graph: A graph with no cycles.     * Tree: A connected acyclic graph.     * Forest: A disjoint set of trees.     * Spanning Tree: A subgraph that contains all vertices of a connected graph and forms a single tree.     * Spanning Forest: The union of spanning trees of the graph's connected components.     * Bipartite Graph: A graph where vertices can be divided into two sets such that all edges connect a vertex in one set to a vertex in the other.

Abstract Data Type (ADT) Interfaces

  • ISimpleGraph Interface:     * int getNrVertices()     * int getNrEdges()     * void addUndirectedEdge(int either, int other)     * Iterable<Integer> nodesAdjacentTo(int node)     * Iterable<SimpleEdge> edgesAdjacentTo(int node)     * boolean hasVertex(int node)     * boolean hasEdge(int either, int other)     * void initFromFile(String file)     * void printGraph()
  • Adjacency List Implementation (SimpleGraphAdjList):     * Attributes: int V, int E, and a Set<Integer>[] g where each index i contains a set of adjacent nodes j.     * In an undirected graph, adding an edge requires adding two entries: g[either].add(other) and g[other].add(either).

Graph Searching (Graph Traversal)

  • Breadth-First Search (BFS):     * Method: Expand the frontier of explored vertices across the breadth of the graph. It resembles level-by-level tree printing and utilizes a Queue.     * Vertex attributes:         * Color Code: White (not discovered), Grey (discovered but not fully explored), Black (discovered and fully explored).         * parent: Stores the node that discovered vertex vv.         * distance: Number of edges from the source root to vv.     * Complexity: Θ(V+E)\Theta(V + E). The adjacency list of each vertex is scanned only once when the vertex is dequeued.     * Properties: Can be used to test connectivity (if all nodes become Black) and find the shortest path in an unweighted graph.
  • Depth-First Search (DFS):     * Method: Explore "deeper" into the graph. Backtrack only when all edges of the current vertex have been explored.     * State: Also uses White, Grey, and Black coloring and parent tracking.     * Applications:         * Cycle detection.         * Two-colorability (Bipartite testing).         * Identifying connected components.

Minimum Spanning Trees (MST)

  • The Starting Problem: Given a map of nn cities connected by roads requiring modernization at specific costs, find a subset of roads that connects all cities with minimal total cost.
  • Formal Problem: Given an undirected graph G=(V,E)G = (V, E) with weight function w(u,v)w(u, v), find a subset AEA \subseteq E that connects all vertices and minimizes (u,v)Aw(u,v)\sum_{(u, v) \in A} w(u, v).
  • MST Properties:     * Property 1: The graph GA=(V,A)G_A = (V, A) is acyclic (if a cycle existed, an edge could be removed to reduce weight while maintaining connectivity).     * Property 2: A tree has exactly V1|V| - 1 edges.     * Property 3: There is exactly one path between any two nodes in a tree.
  • Generic Algorithm: Start with an empty set AA, and while AA does not form a spanning tree, find a safe edge (u,v)(u, v) such that A \cup \{(u, v)\}\n remains a subset of some MST.

Prim's Algorithm

  • Approach: The MST grows as a single tree from a random start node.
  • Safe Edge Selection: Select the minimum weight edge connecting a vertex in the currently built tree (AVA_V) to a vertex in the remaining set (VAVV - A_V).
  • Implementation Options:     1. Brute Force: Test all edges at each step. Complexity: Θ(V×E)\Theta(V \times E), approximately O(V3)O(V^3).     2. Distance Array: Use an array dist[v] to track minimum distance to the tree. Complexity: Θ(V2)\Theta(V^2). This is optimal for dense graphs.     3. Priority Queue (Binary Heap): Use a Min-Heap to extract the minimum weight vertex not yet in MST. Complexity: O(ElogV)O(E \log V). This is optimal for sparse graphs.
  • Greedy nature: It is a greedy algorithm that is proven to always find the global optimum for MST.

Kruskal's Algorithm

  • Approach: The MST starts as a forest where each vertex is a single tree. It adds the global minimum weight edge that does not create a cycle.
  • Algorithm Steps:     1. Initialize a forest AA as an empty set.     2. For each vertex, create a single-element tree.     3. Sort all edges of GG into increasing order by weight.     4. Iterate through sorted edges (u,v)(u, v): If uu and vv are in different trees, add (u,v)(u, v) to AA and union the trees.
  • Complexity: O(ElogE)O(E \log E), which is equivalent to O(ElogV)O(E \log V). The performance is primarily determined by the sorting of edges and the implementation of the Union-Find data structure.

Union-Find (Disjoint Set) Data Structure

  • Operations:     * MAKE-SET(x): Create a new set containing only xx.     * FIND-SET(x): Return the representative element of the set containing xx.     * UNION(x, y): Merge the sets containing xx and yy.
  • Implementations:     * Linked Lists: Each set is a linked list.         * Weighted-Union Heuristic: Always append the shorter list to the longer. A sequence of mm operations takes O(m+nlogn)O(m + n \log n).     * Up-tree Forest: Nodes point to their parents. Representative is the root.         * Union by Rank: Make the root of the smaller tree a child of the root of the larger tree.         * Path Compression: During FIND-SET, make all visited nodes direct children of the root.         * Complexity: Worst-case time for a sequence of operations is O(mα(n))O(m \cdot \alpha(n)), where α\alpha is the inverse Ackermann function (extremely slow growth).

Shortest Paths in Directed Graphs

  • Weight of Path: For path p=v1v2vkp = v_1 \rightarrow v_2 \rightarrow \dots \rightarrow v_k, weight is w(p)=i=1k1w(vi,vi+1)w(p) = \sum_{i=1}^{k-1} w(v_i, v_{i+1}).
  • Shortest Path Weight (δ(u,v)\delta(u, v)): The minimum weight among all paths from uu to vv; if no path exists, δ(u,v)=\delta(u, v) = \infty.
  • Properties:     1. Optimal Substructure: Any subpath of a shortest path is also a shortest path.     2. No Cycles: Shortest paths cannot contain positive or zero-weight cycles (they can be omitted) and are undefined if a negative-weight cycle is reachable.

Shortest Path Algorithms

  • Dijkstra's Algorithm:     * Constraint: All edge weights must be positive.     * Method: Uses a priority queue and a "Relaxation" operation: if (distTo[v] > distTo[u] + w(u, v)) then update distTo[v].     * Complexity: O((V+E)logV)O((V+E) \log V) using a binary heap, or O(V2)O(V^2) using an array.
  • Bellman-Ford Algorithm:     * Capabilities: Allows negative-weight edges; detects negative-weight cycles.     * Method: Performs V1|V| - 1 relaxation passes over all edges.     * Negative Cycle Detection: Performs one additional pass; if any edge can still be relaxed, a negative cycle exists.     * Complexity: O(V×E)O(V \times E).
  • Floyd-Warshall Algorithm:     * Goal: All-pairs shortest paths.     * Dynamic Programming Approach: Defines dij(k)d_{ij}^{(k)} as the shortest path from ii to jj with intermediate vertices in set {1,2,,k}\{1, 2, \dots, k\}.     * Recursive step: dij(k)=min(dij(k1),dik(k1)+dkj(k1))d_{ij}^{(k)} = \min(d_{ij}^{(k-1)}, d_{ik}^{(k-1)} + d_{kj}^{(k-1)}).     * Complexity: Θ(V3)\Theta(V^3).
  • Transitive Closure: Determines if a path exists from ii to jj for all pairs. Can be computed via a modified Floyd-Warshall using logical OR instead of min and logical AND instead of addition.

Questions and Discussion

  • BFS Color Requirement: Is it necessary to color nodes in BFS using 3 colors?     * Response: Code examples show that a simplified boolean marked[v] array can often replace the three-color system for basic traversal logic.
  • Comparison of Dijkstra and Prim: What is the difference?     * Response: Dijkstra minimizes the total distance from a single source to each node (δ(s,v)\delta(s, v)). Prim minimizes the total weight of the entire connecting network (the Spanning Tree).