Algorithm Design and Analysis: Graph Theory and Implementation
Graph Basic Concepts and Representation
- Definition of a Graph: A graph G=(V,E) consists of a set of vertices V and a set of edges E.
- Edge Constraints: The number of edges is bounded by the square of the vertices: |E| {
Directed and Undirected Graphs
- Undirected Graphs:
* An edge (u,v)∈E implies that edge (v,u)∈E also exists.
* Example: Road networks between cities.
- Directed Graphs (Digraphs):
* An edge (u,v)∈E does not imply edge (v,u)∈E.
* Example: Street networks in a downtown area with one-way streets.
- Self-loops: Self-loop edges (an edge where the source and destination are the same vertex) are possible only in directed graphs according to [CLRS] (Figures 22.1 and 22.2).
Connectivity and Structural Properties
- Adjacency and Incidence: When an edge connects two vertices, the vertices are said to be adjacent to one another, and the edge is said to be incident on both vertices.
- Degree of a Vertex:
* Undirected Graphs: The degree is the number of edges adjacent to vertex v.
* Directed Graphs: Measured by in-degree (edges coming into the vertex) and out-degree (edges leaving the vertex). Total degree is the sum of both.
- Weighted Graphs: Each edge has an associated numerical value known as a weight.
- Connectivity:
* Undirected: A graph is connected if a path exists between any two vertices.
* Directed: A graph is strongly connected if there is a directed path between any two vertices in both directions.
- Density:
* Dense Graphs: The number of edges is close to the maximum possible (∣V∣2).
* Sparse Graphs: The number of edges is small (there is no clear numerical threshold provided, but it implies ∣E∣≪∣V∣2).
Data Structures for Graph Representation
- Adjacency Matrix:
* Vertices are numbered V={1,2,…,n}.
* An n×n matrix A represents the graph.
* A[i,j]=1 if edge (i,j)∈E, and 0 otherwise.
* Weighted Graphs: A[i,j]=wij if the edge exists.
* Undirected Graphs: The matrix is symmetric (A[i,j]=A[j,i]).
* Performance:
* Time to check if an edge exists between u and v: Θ(1).
* Memory usage: Θ(n2) regardless of edge count.
* Suitability: Efficient for small or dense graphs, but usually takes too much storage for large graphs.
- Adjacency List:
* For each vertex v∈V, a list of all vertices adjacent to v is stored.
* Weighted Graphs: Each vertex item in the list also stores the weight value.
* Memory Usage:
* Directed graphs: Total items ∑out-degree(v)=∣E∣.
* Undirected graphs: Total items ∑degree(v)=2∣E∣
* Overall space complexity: Θ(V+E).
* Performance: Time to test if an edge (u,v)∈E exists is O(E).
Simple Undirected Unweighted Graphs
- Path Concepts:
* Path: A sequence of vertices connected by edges with no repeated edges.
* Simple Path: A path with no repeated vertices.
* Cycle: A path where the first and last vertices are the same (must have at least one edge).
* Simple Cycle: A cycle with no repeated vertices except the starting/ending vertex.
- Subgraph Concepts:
* Connected Components: Maximal connected subgraphs.
* Acyclic Graph: A graph with no cycles.
* Tree: A connected acyclic graph.
* Forest: A disjoint set of trees.
* Spanning Tree: A subgraph that contains all vertices of a connected graph and forms a single tree.
* Spanning Forest: The union of spanning trees of the graph's connected components.
* Bipartite Graph: A graph where vertices can be divided into two sets such that all edges connect a vertex in one set to a vertex in the other.
Abstract Data Type (ADT) Interfaces
- ISimpleGraph Interface:
*
int getNrVertices()
* int getNrEdges()
* void addUndirectedEdge(int either, int other)
* Iterable<Integer> nodesAdjacentTo(int node)
* Iterable<SimpleEdge> edgesAdjacentTo(int node)
* boolean hasVertex(int node)
* boolean hasEdge(int either, int other)
* void initFromFile(String file)
* void printGraph() - Adjacency List Implementation (
SimpleGraphAdjList):
* Attributes: int V, int E, and a Set<Integer>[] g where each index i contains a set of adjacent nodes j.
* In an undirected graph, adding an edge requires adding two entries: g[either].add(other) and g[other].add(either).
Graph Searching (Graph Traversal)
- Breadth-First Search (BFS):
* Method: Expand the frontier of explored vertices across the breadth of the graph. It resembles level-by-level tree printing and utilizes a Queue.
* Vertex attributes:
* Color Code: White (not discovered), Grey (discovered but not fully explored), Black (discovered and fully explored).
*
parent: Stores the node that discovered vertex v.
* distance: Number of edges from the source root to v.
* Complexity: Θ(V+E). The adjacency list of each vertex is scanned only once when the vertex is dequeued.
* Properties: Can be used to test connectivity (if all nodes become Black) and find the shortest path in an unweighted graph. - Depth-First Search (DFS):
* Method: Explore "deeper" into the graph. Backtrack only when all edges of the current vertex have been explored.
* State: Also uses White, Grey, and Black coloring and parent tracking.
* Applications:
* Cycle detection.
* Two-colorability (Bipartite testing).
* Identifying connected components.
Minimum Spanning Trees (MST)
- The Starting Problem: Given a map of n cities connected by roads requiring modernization at specific costs, find a subset of roads that connects all cities with minimal total cost.
- Formal Problem: Given an undirected graph G=(V,E) with weight function w(u,v), find a subset A⊆E that connects all vertices and minimizes ∑(u,v)∈Aw(u,v).
- MST Properties:
* Property 1: The graph GA=(V,A) is acyclic (if a cycle existed, an edge could be removed to reduce weight while maintaining connectivity).
* Property 2: A tree has exactly ∣V∣−1 edges.
* Property 3: There is exactly one path between any two nodes in a tree.
- Generic Algorithm: Start with an empty set A, and while A does not form a spanning tree, find a safe edge (u,v) such that A \cup \{(u, v)\}\n remains a subset of some MST.
Prim's Algorithm
- Approach: The MST grows as a single tree from a random start node.
- Safe Edge Selection: Select the minimum weight edge connecting a vertex in the currently built tree (AV) to a vertex in the remaining set (V−AV).
- Implementation Options:
1. Brute Force: Test all edges at each step. Complexity: Θ(V×E), approximately O(V3).
2. Distance Array: Use an array
dist[v] to track minimum distance to the tree. Complexity: Θ(V2). This is optimal for dense graphs.
3. Priority Queue (Binary Heap): Use a Min-Heap to extract the minimum weight vertex not yet in MST. Complexity: O(ElogV). This is optimal for sparse graphs. - Greedy nature: It is a greedy algorithm that is proven to always find the global optimum for MST.
Kruskal's Algorithm
- Approach: The MST starts as a forest where each vertex is a single tree. It adds the global minimum weight edge that does not create a cycle.
- Algorithm Steps:
1. Initialize a forest A as an empty set.
2. For each vertex, create a single-element tree.
3. Sort all edges of G into increasing order by weight.
4. Iterate through sorted edges (u,v): If u and v are in different trees, add (u,v) to A and union the trees.
- Complexity: O(ElogE), which is equivalent to O(ElogV). The performance is primarily determined by the sorting of edges and the implementation of the Union-Find data structure.
Union-Find (Disjoint Set) Data Structure
- Operations:
*
MAKE-SET(x): Create a new set containing only x.
* FIND-SET(x): Return the representative element of the set containing x.
* UNION(x, y): Merge the sets containing x and y. - Implementations:
* Linked Lists: Each set is a linked list.
* Weighted-Union Heuristic: Always append the shorter list to the longer. A sequence of m operations takes O(m+nlogn).
* Up-tree Forest: Nodes point to their parents. Representative is the root.
* Union by Rank: Make the root of the smaller tree a child of the root of the larger tree.
* Path Compression: During
FIND-SET, make all visited nodes direct children of the root.
* Complexity: Worst-case time for a sequence of operations is O(m⋅α(n)), where α is the inverse Ackermann function (extremely slow growth).
Shortest Paths in Directed Graphs
- Weight of Path: For path p=v1→v2→⋯→vk, weight is w(p)=∑i=1k−1w(vi,vi+1).
- Shortest Path Weight (δ(u,v)): The minimum weight among all paths from u to v; if no path exists, δ(u,v)=∞.
- Properties:
1. Optimal Substructure: Any subpath of a shortest path is also a shortest path.
2. No Cycles: Shortest paths cannot contain positive or zero-weight cycles (they can be omitted) and are undefined if a negative-weight cycle is reachable.
Shortest Path Algorithms
- Dijkstra's Algorithm:
* Constraint: All edge weights must be positive.
* Method: Uses a priority queue and a "Relaxation" operation:
if (distTo[v] > distTo[u] + w(u, v)) then update distTo[v].
* Complexity: O((V+E)logV) using a binary heap, or O(V2) using an array. - Bellman-Ford Algorithm:
* Capabilities: Allows negative-weight edges; detects negative-weight cycles.
* Method: Performs ∣V∣−1 relaxation passes over all edges.
* Negative Cycle Detection: Performs one additional pass; if any edge can still be relaxed, a negative cycle exists.
* Complexity: O(V×E).
- Floyd-Warshall Algorithm:
* Goal: All-pairs shortest paths.
* Dynamic Programming Approach: Defines dij(k) as the shortest path from i to j with intermediate vertices in set {1,2,…,k}.
* Recursive step: dij(k)=min(dij(k−1),dik(k−1)+dkj(k−1)).
* Complexity: Θ(V3).
- Transitive Closure: Determines if a path exists from i to j for all pairs. Can be computed via a modified Floyd-Warshall using logical
OR instead of min and logical AND instead of addition.
Questions and Discussion
- BFS Color Requirement: Is it necessary to color nodes in BFS using 3 colors?
* Response: Code examples show that a simplified boolean
marked[v] array can often replace the three-color system for basic traversal logic. - Comparison of Dijkstra and Prim: What is the difference?
* Response: Dijkstra minimizes the total distance from a single source to each node (δ(s,v)). Prim minimizes the total weight of the entire connecting network (the Spanning Tree).