CoEng3122 - Algorithm Analysis and Design - Algorithm for Graph Problems
Algorithm Analysis and Design - Graph Problems
Chapter 5: Algorithms for Graph Problems
- Depth-first search
- Connected Components
- Topological sort
- Shortest path
- Sets of strings
- Search trees, string sorting, binary search
- Exact string matching - Finding a pattern (string) in a text (string)
- Approximate string matching - Finding in the text something that is similar to the pattern
Problem-Solving Agent
- Goal formulation: The agent adopts the goal of reaching the destination, organizing behavior by limiting objectives and actions.
- Problem formulation: The agent creates a description of the states and actions needed to reach the goal, forming an abstract model of the relevant world.
- Search: The agent simulates sequences of actions in its model to find a sequence that reaches the goal.
- Solution: A sequence of actions that reaches the goal.
- Execution: The agent executes the actions in the solution one at a time.
Requirements of Search Algorithms/Techniques
- It should cause motion.
- It should be systematic.
Search Types
1. Uninformed Search (Blind Search)
- Also called blind, exhaustive, or brute-force search.
- Uses no information about the problem to guide the search.
- May not be very efficient.
2. Informed Search (Heuristic Search)
- Also called heuristic or intelligent search.
- Uses information about the problem to guide the search.
- Guesses the distance to a goal state.
- More efficient but may not always be possible.
Search Techniques / Strategies
Uninformed Search:
- Depth-First Search (DFS)
- Breadth-First Search (BFS)
- Cost-First Search
Informed Search:
- Generate and Test
- Hill Climbing
- Best First Search
- Problem Reduction
- Constraint Satisfaction
- Means-end Analysis
- A* Search
- A0* Search
Types of Graphs
- Directed graph
- Undirected graph
- Weighted graph
- Cyclic graph
- Acyclic graph
- Bipartite graph
Uninformed/Blind Search
- Does not contain any domain knowledge (e.g., closeness, location of the goal).
- Operates in a brute-force way.
- Includes information about how to traverse the tree and identify leaf and goal nodes.
- Searches without information about the search space (initial state operators and test for the goal).
- Examines each node of the tree until the goal node is reached.
Informed Search
- Uses domain knowledge to guide the search.
- Can find a solution more efficiently than uninformed search.
- Also called a Heuristic search.
- A heuristic might not always guarantee the best solution but finds a good solution in a reasonable time.
- Can solve complex problems that might not be solvable otherwise.
Graph Definitions
- Two vertices are adjacent if they are joined by an edge.
- Vertex w is adjacent to v iff (v,w) \in E.
- In an undirected graph with edge (v, w), w is adjacent to v, and v is adjacent to w.
- A path between two vertices is a sequence of edges that begins at one vertex and ends at another vertex.
- w1, w2, …, wN is a path if (wi, w_{i+1}) \in E for 1 \leq i \leq N-1.
- A simple path passes through a vertex only once.
- A cycle is a path that begins and ends at the same vertex.
- A simple cycle is a cycle that does not pass through other vertices more than once.
Graph Example
A graph G (undirected) with 5 vertices and 6 edges:
V = {1, 2, 3, 4, 5}
E = {(1,2), (1,3), (1,4), (2,5), (3,4), (4,5), (2,1), (3,1), (4,1), (5,2), (4,3), (5,4)}
- Adjacent: 1 and 2 are adjacent (1 is adjacent to 2, and 2 is adjacent to 1).
- Path: 1, 2, 5 (a simple path), 1, 3, 4, 1, 2, 5 (a path but not a simple path).
- Cycle: 1, 3, 4, 1 (a simple cycle), 1, 3, 4, 1, 4, 1 (cycle, but not a simple cycle).
More Graph Definitions
- A connected graph has a path between each pair of distinct vertices.
- A complete graph has an edge between each pair of distinct vertices.
- A complete graph is also a connected graph, but a connected graph may not be a complete graph.
Directed Graphs (Digraphs)
- If the edge pair is ordered then the graph is called a directed graph
- Each edge has a direction and is called a directed edge.
- Definitions for undirected graphs apply to directed graphs with adjustments for direction.
- Vertex w is adjacent to v iff (v,w) \in E (direct edge from v to w).
- w is a successor of v.
- v is a predecessor of w.
- A directed path between two vertices is a sequence of directed edges that begins at one vertex and ends at another vertex.
- w1, w2, …, wN is a path if (wi, w_{i+1}) \in E for 1 \leq i \leq N-1.
Directed Graphs - Cycles and Connectivity
- A cycle in a directed graph is a path of length at least 1 such that w1 = wN. This cycle is simple if the path is simple. For undirected graphs, the edges must be distinct.
- A directed acyclic graph (DAG) is a directed graph with no cycles.
- An undirected graph is connected if there is a path from every vertex to every other vertex.
- A directed graph with this property is called strongly connected.
- If a directed graph is not strongly connected, but the underlying graph (without direction to arcs) is connected, then the graph is weakly connected.
Directed Graph Example
A graph G = (V, E) with 5 vertices and 6 edges:
V = {1, 2, 3, 4, 5}
E = {(1,2), (1,4), (2,5), (4,5), (3,1), (4,3)}
- Adjacent: 2 is adjacent to 1, but 1 is NOT adjacent to 2.
- Path: 1, 2, 5 (a directed path).
- Cycle: 1, 4, 3, 1 (a directed cycle).
Weighted Graph
- Edges are labeled with numeric values.
Graph Implementations
- Adjacency Matrix: A two-dimensional array.
- Adjacency List: For each vertex, a list of adjacent vertices.
Graph Terminology Summary
- Graph: G = (V, E)
- Vertex set: V
- Edge set: E (pairs of adjacent vertices)
Directed and Undirected Graphs
- G directed: Edges are ordered pairs (u, v).
- G undirected: Edges are unordered pairs{u, v} .
Proper Graphs and Subgraphs
- Proper graph: No loops, no multi-edges.
- Subgraph G' of G: G' = (V', E'), where V' is a subset of V and E' is a subset of E between vertices of V'.
Paths in Graphs
- Path p: A sequence of vertices v0, v1, …, vk, where for i = 1, …, k, v{i-1} is adjacent to v_i.
- Equivalently, p is a sequence of edges e1, …, ek, where for i = 2, …, k, each consecutive pair of edges e{i-1}, ei share a vertex.
Simple Paths and Cycles
- Simple path: No edge or vertex repeated, except possibly v0 = vk.
- Cycle: A path p with v0 = vk, where k > 1.
Connected Undirected Graph
- G is connected if there is a path between each pair of vertices.
Connected Components of an Undirected Graph
- Else G has k > 1 maximal connected subgraphs = connected components
Biconnected Component
- Sets of edges where:
- Is a single edge.
- There are at least two disjoint paths between each pair of vertices.
Size of a Graph
- Graph G = (V, E)
- n = |V| = # vertices
- m = |E| = # edges
- Size of G is n + m
Representing a Graph by an Adjacency Matrix
- Adjacency Matrix A is size n \times n
- A[i, j] = 1 if (i, j) \in E, 0 else.
- Space cost O(n^2).
Adjacency List Representation of a Graph
- Adjacency Lists: Adj(1), …, Adj(n)
- Adj(v) = list of vertices adjacent to v
- Space cost: O(n + m)
Definition of an Undirected Tree
- Tree T: Graph with a unique simple path between every pair of vertices.
- n = # vertices, n-1 = # edges
- Forest: Set of trees.
Definition of a Directed Tree
- Directed Tree T: Digraph with distinguished vertex root r such that each vertex is reachable from r by a unique path.
- Family Relationships:
- Ancestors
- Descendants
- Parent
- Child
- Siblings
- Leaves have no proper descendants.
An Ordered Tree
- Ordered Tree: Is a directed tree with siblings ordered
Preorder Tree Traversal
- Preorder: A, B, C, D, E, F, H, I
- Root (order vertices as pushed on stack)
- Preorder left subtree
- Preorder right subtree
Postorder Tree Traversal
- Postorder: B, E, D, H, I, F, C, A
- Postorder left subtree
- Postorder right subtree
- Root (order vertices as popped off stack)
Spanning Tree and Forest of a Graph
- T is a spanning tree of graph G if
- T is a directed tree with the same vertex set as G.
- Each edge of T is a directed version of an edge of G.
- Spanning Forest: Forest of spanning trees of connected components of G.
Example Spanning Tree of a Graph
- Classifications of Edges of G with Spanning Tree T
- An edge (u, v) of T is a tree edge.
- An edge (u, v) of G - T is a back edge if u is a descendant or ancestor of v.
- Else (u, v) is a cross edge (do not exist in undirected DFS).
Adjacency Matrix
- An adjacency matrix for a graph with n vertices numbered 0, 1, …, n-1 is an n \times n array matrix such that matrix[i][j] is 1 (true) if there is an edge from vertex i to vertex j, and 0 (false) otherwise.
- When the graph is weighted, we can let matrix[i][j] be the weight that labels the edge from vertex i to vertex j, instead of simply 1, and let matrix[i][j] equal to \infty instead of 0 when there is no edge from vertex i to vertex j.
- Adjacency matrix for an undirected graph is symmetrical, i.e., matrix[i][j] is equal to matrix[j][i].
- Space requirement O(|V|^2).
- Acceptable if the graph is dense.
Adjacency Matrix – Examples
- Directed Graph Adjacency Matrix
- Undirected Weighted Graph Adjacency Matrix
Adjacency List
- An adjacency list for a graph with n vertices numbered 0, 1, …, n-1 consists of n linked lists. The i^{th} linked list has a node for vertex j if and only if the graph contains an edge from vertex i to vertex j.
- Adjacency list is a better solution if the graph is sparse.
- Space requirement is O(|E| + |V|), which is linear in the size of the graph.
- In an undirected graph each edge (v, w) appears in two lists, thus space requirement is doubled.
Adjacency List – Examples
- Directed Graph Adjacency List
- Undirected Weighted Graph Adjacency List
Adjacency Matrix vs Adjacency List
Two common graph operations:
- Determine whether there is an edge from vertex i to vertex j.
- Find all vertices adjacent to a given vertex i.
- An adjacency matrix supports operation 1 more efficiently.
- An adjacency list supports operation 2 more efficiently.
- An adjacency list often requires less space than an adjacency matrix.
- Adjacency Matrix: Space requirement is O(|V|^2)
- Adjacency List: Space requirement is O(|E| + |V|), which is linear in the size of the graph.
- Adjacency matrix is better if the graph is dense (too many edges).
- Adjacency list is better if the graph is sparse (few edges).
Graph Traversals
- A graph-traversal algorithm starts from a vertex v, visits all of the vertices that can be reachable from the vertex v.
- A graph-traversal algorithm visits all vertices if and only if the graph is connected.
- A connected component is the subset of vertices visited during a traversal algorithm that begins at a given vertex.
- A graph-traversal algorithm must mark each vertex during a visit and must never visit a vertex more than once.
- Thus, if a graph contains a cycle, the graph-traversal algorithm can avoid an infinite loop.
- We look at two graph-traversal algorithms:
Depth-First Search (aka Depth-First Traversal)
Breadth-First Search (aka Breadth-First Traversal)
Depth-First Search
- For a given vertex v, the depth-first search algorithm proceeds along a path from v as deeply into the graph as possible before backing up.
- That is, after visiting a vertex v, the depth-first search algorithm visits (if possible) an unvisited adjacent vertex to vertex v.
- The depth-first traversal algorithm does not completely specify the order in which it should visit the vertices adjacent to v.
- We may visit the vertices adjacent to v in sorted order.
Depth-First Search – Example
- A depth-first search of the graph starts from vertex v.
- Visit a vertex, then visit a vertex adjacent to that vertex.
- If there is no unvisited vertex adjacent to a visited vertex, back up to the previous step.
Depth-First Search Algorithm
- Depth-first search is a recursive algorithm for traversing a tree or graph data structure.
- It is called depth-first search because it starts from the root node and follows each path to its greatest depth node before moving to the next path.
- DFS uses a stack data structure for its implementation.
- The process of the DFS algorithm is similar to the BFS algorithm.
Depth-First Search Algorithm Steps
- If the initial state is a goal state, quit and return success.
- Otherwise, do the following until success or failure is signaled:
- Generate a successor, E, of the initial state. If there are no more successors, signal failure.
- Call Depth-First Search with E as the initial state.
- If success is returned, signal success. Otherwise, continue in this loop.
Depth-First Search Example Steps
- Illustrative steps of the DFS algorithm with a sample graph, showing node expansion and the evolution of the NODE-LIST.
Depth-First Search Algorithm - Analysis
- Completeness: Complete within a finite state space as it will expand every node within a limited search tree.
- Time Complexity: Will be equivalent to the node traversed by the algorithm. T(n) = 1 + n^2 + n^3 + … + n^m = O(n^m), where m = maximum depth of any node and this can be much larger than d (Shallowest solution depth).
- Space Complexity: Needs to store only a single path from the root node, hence the space complexity of DFS is equivalent to the size of the fringe set, which is O(b^m).
- Optimal: Non-optimal, as it may generate a large number of steps or high cost to reach the goal node.
Depth-First Search Algorithm - Advantages and Disadvantages
Advantages:
- Requires less memory since only the nodes on the current path are stored.
- May find a solution without examining much of the search space at all.
Disadvantages:
- May find a sub-optimal solution (one that is deeper or more costly than the best solution).
- Incomplete: Without a depth bound, one may not find a solution even if one exists.
Tarjan’s Depth First Search Algorithm
- We assume a Random Access Machine (RAM) computational model
- Algorithm Depth First Search
- Input graph G = (V,E) represented by adjacency lists Adj(v) for each v \in V
- N \leftarrow 0
- for all v \in V do (number(v) \leftarrow 0; children(v) \leftarrow () )
- for all v \in V do if number(v) = 0 then DFS(v)
- spanning forest defined by children if then output
- Input graph G = (V,E) represented by adjacency lists Adj(v) for each v \in V
Recursive DFS Procedure
- The preorder numbers give the order the vertices are first visited in DFS.
- DFS(v)
- N \leftarrow N + 1; number(v) \leftarrow N
- for each u \in Adj(v) do if number(u) = 0 then (add u to children(v); DFS(u)) recursive procedure
Time Cost of Depth-First Search (DFS) Algorithm on a RAM
- Input size n = |V|, m = |E|
- Theorem: Depth-First Search takes linear time cost O(n + m)
- Proof: Can associate with each edge and vertex a constant number of operations.
Classification of Edges of G via DFS Spanning Tree T
Edge notation induced by DFS Tree T
- u \rightarrow v iff (u, v) is a tree edge of T
- u \rightarrow^* v iff u is an ancestor of v
- u --- v iff (u, v) is a backedge if (u, v) \in G - T with either u \rightarrow^* v or v \rightarrow^* u
Search Strategies - Breadth-First Search Algorithm
- The most common search strategy for traversing a tree or graph. This algorithm searches breadthwise in a tree or graph, so it is called breadth-first search.
- BFS starts searching from the root node of the tree and expands all successor nodes at the current level before moving to nodes of next level.
- An example of a general-graph search algorithm.
- Implemented using FIFO queue data structure.
Breadth-First Search Algorithm Steps
Create a variable called NODE-LIST and set it to the initial state.
Until a goal state is found, or NODE-LIST is empty:
Remove the first element from NODE-LIST and call it E. If NODE-LIST was empty, then quit.
For element E do the following:
- Apply the rule to generate a new state
- If the new state is a goal state, quit and return this state
- Otherwise, add the new state to the end of NODE-LIST
Breadth-First Search Algorithm Function
function BREADTH-FIRST-SEARCH(problem) returns a solution node or failure
node = NODE(problem.INITIAL)
if problem.IS-GOAL(node.STATE) then return node
frontier = a FIFO queue, with node as an element
reached = {problem.INITIAL}
while not IS-EMPTY(frontier) do
node = POP(frontier)
for each child in EXPAND(problem, node) do
s = child.STATE
if problem.IS-GOAL(s) then return child
if s is not in reached then
add s to reached
add child to frontier
return failure
function UNIFORM-COST-SEARCH(problem) returns a solution node, or failure
return BEST-FIRST-SEARCH(problem, PATH-COST)
Breadth-First Search Example Steps
- Illustrative steps of the BFS algorithm with a sample graph, showing node expansion and the evolution of the NODE-LIST.
Breadth-First Search Algorithm - Advantages and Disadvantages
Advantages:
- One of the simplest search strategies.
- BFS is Complete. If there is a solution, BFS is guaranteed to find it.
- If there are multiple solutions, then a minimal solution will be found.
Disadvantages:
- The breadth-first search algorithm cannot be effectively used unless the search space is quite small.
Breadth-First Search Algorithm - Analysis
- Time Complexity: Can be obtained by the number of nodes traversed in BFS until the shallowest Node. Where the d = depth of the shallowest solution and b is a node at every state. T(b) = 1 + b^2 + b^3 + … + b^d = O(b^d)
- Space Complexity: Given by the Memory size of frontier which is O(b^d)
- Completeness: Complete, which means if the shallowest goal node is at some finite depth, then BFS will find a solution.
- Optimality: Optimal if path cost is a non-decreasing function of the depth of the node.