CoEng3122 - Algorithm Analysis and Design - Algorithm for Graph Problems

Algorithm Analysis and Design - Graph Problems

Chapter 5: Algorithms for Graph Problems

Depth-first search
Connected Components
Topological sort
Shortest path
Sets of strings
Search trees, string sorting, binary search
Exact string matching - Finding a pattern (string) in a text (string)
Approximate string matching - Finding in the text something that is similar to the pattern

Problem-Solving Agent

Goal formulation: The agent adopts the goal of reaching the destination, organizing behavior by limiting objectives and actions.
Problem formulation: The agent creates a description of the states and actions needed to reach the goal, forming an abstract model of the relevant world.
Search: The agent simulates sequences of actions in its model to find a sequence that reaches the goal.
Solution: A sequence of actions that reaches the goal.
Execution: The agent executes the actions in the solution one at a time.

Requirements of Search Algorithms/Techniques

It should cause motion.
It should be systematic.

Search Types

1. Uninformed Search (Blind Search)

Also called blind, exhaustive, or brute-force search.
Uses no information about the problem to guide the search.
May not be very efficient.

2. Informed Search (Heuristic Search)

Also called heuristic or intelligent search.
Uses information about the problem to guide the search.
Guesses the distance to a goal state.
More efficient but may not always be possible.

Search Techniques / Strategies

Uninformed Search:
- Depth-First Search (DFS)
- Breadth-First Search (BFS)
- Cost-First Search
Informed Search:
- Generate and Test
- Hill Climbing
- Best First Search
- Problem Reduction
- Constraint Satisfaction
- Means-end Analysis
- A* Search
- A0* Search

Types of Graphs

Directed graph
Undirected graph
Weighted graph
Cyclic graph
Acyclic graph
Bipartite graph

Uninformed/Blind Search

Does not contain any domain knowledge (e.g., closeness, location of the goal).
Operates in a brute-force way.
Includes information about how to traverse the tree and identify leaf and goal nodes.
Searches without information about the search space (initial state operators and test for the goal).
Examines each node of the tree until the goal node is reached.

Informed Search

Uses domain knowledge to guide the search.
Can find a solution more efficiently than uninformed search.
Also called a Heuristic search.
A heuristic might not always guarantee the best solution but finds a good solution in a reasonable time.
Can solve complex problems that might not be solvable otherwise.

Graph Definitions

Two vertices are adjacent if they are joined by an edge.
Vertex w is adjacent to v iff $(v,w) \in E$ .
- In an undirected graph with edge $(v, w)$ , w is adjacent to v, and v is adjacent to w.
A path between two vertices is a sequence of edges that begins at one vertex and ends at another vertex.
- $w1, w2, …, wN$ is a path if $(wi, w_{i+1}) \in E$ for $1 \leq i \leq N-1$ .
A simple path passes through a vertex only once.
A cycle is a path that begins and ends at the same vertex.
A simple cycle is a cycle that does not pass through other vertices more than once.

Graph Example

A graph $G$ (undirected) with 5 vertices and 6 edges:
$V = {1, 2, 3, 4, 5}$
$E = {(1,2), (1,3), (1,4), (2,5), (3,4), (4,5), (2,1), (3,1), (4,1), (5,2), (4,3), (5,4)}$

Adjacent: 1 and 2 are adjacent (1 is adjacent to 2, and 2 is adjacent to 1).
Path: 1, 2, 5 (a simple path), 1, 3, 4, 1, 2, 5 (a path but not a simple path).
Cycle: 1, 3, 4, 1 (a simple cycle), 1, 3, 4, 1, 4, 1 (cycle, but not a simple cycle).

More Graph Definitions

A connected graph has a path between each pair of distinct vertices.
A complete graph has an edge between each pair of distinct vertices.
- A complete graph is also a connected graph, but a connected graph may not be a complete graph.

Directed Graphs (Digraphs)

If the edge pair is ordered then the graph is called a directed graph
Each edge has a direction and is called a directed edge.
Definitions for undirected graphs apply to directed graphs with adjustments for direction.
Vertex w is adjacent to v iff $(v,w) \in E$ (direct edge from v to w).
- w is a successor of v.
- v is a predecessor of w.
A directed path between two vertices is a sequence of directed edges that begins at one vertex and ends at another vertex.
- $w1, w2, …, wN$ is a path if $(wi, w_{i+1}) \in E$ for $1 \leq i \leq N-1$ .

Directed Graphs - Cycles and Connectivity

A cycle in a directed graph is a path of length at least 1 such that $w1 = wN$ . This cycle is simple if the path is simple. For undirected graphs, the edges must be distinct.
A directed acyclic graph (DAG) is a directed graph with no cycles.
An undirected graph is connected if there is a path from every vertex to every other vertex.
A directed graph with this property is called strongly connected.
If a directed graph is not strongly connected, but the underlying graph (without direction to arcs) is connected, then the graph is weakly connected.

Directed Graph Example

A graph $G = (V, E)$ with 5 vertices and 6 edges:
$V = {1, 2, 3, 4, 5}$
$E = {(1,2), (1,4), (2,5), (4,5), (3,1), (4,3)}$

Adjacent: 2 is adjacent to 1, but 1 is NOT adjacent to 2.
Path: 1, 2, 5 (a directed path).
Cycle: 1, 4, 3, 1 (a directed cycle).

Weighted Graph

Edges are labeled with numeric values.

Graph Implementations

Adjacency Matrix: A two-dimensional array.
Adjacency List: For each vertex, a list of adjacent vertices.

Graph Terminology Summary

Graph: $G = (V, E)$
Vertex set: $V$
Edge set: $E$ (pairs of adjacent vertices)

Directed and Undirected Graphs

G directed: Edges are ordered pairs $(u, v)$ .
G undirected: Edges are unordered pairs ${u, v}$ .

Proper Graphs and Subgraphs

Proper graph: No loops, no multi-edges.
Subgraph $G'$ of $G$ : $G' = (V', E')$ , where $V'$ is a subset of $V$ and $E'$ is a subset of $E$ between vertices of $V'$ .

Paths in Graphs

Path $p$ : A sequence of vertices $v0, v1, …, vk$ , where for $i = 1, …, k$ , $v{i-1}$ is adjacent to $v_i$ .
Equivalently, $p$ is a sequence of edges $e1, …, ek$ , where for $i = 2, …, k$ , each consecutive pair of edges $e{i-1}, ei$ share a vertex.

Simple Paths and Cycles

Simple path: No edge or vertex repeated, except possibly $v0 = vk$ .
Cycle: A path $p$ with $v0 = vk$ , where k > 1.

Connected Undirected Graph

$G$ is connected if there is a path between each pair of vertices.

Connected Components of an Undirected Graph

Else $G$ has k > 1 maximal connected subgraphs = connected components

Biconnected Component

Sets of edges where:
- Is a single edge.
- There are at least two disjoint paths between each pair of vertices.

Size of a Graph

Graph $G = (V, E)$
$n = |V|$ = # vertices
$m = |E|$ = # edges
Size of $G$ is $n + m$

Representing a Graph by an Adjacency Matrix

Adjacency Matrix $A$ is size $n \times n$
$A[i, j] = 1$ if $(i, j) \in E$ , $0$ else.
Space cost $O(n^2)$ .

Adjacency List Representation of a Graph

Adjacency Lists: $Adj(1), …, Adj(n)$
$Adj(v)$ = list of vertices adjacent to $v$
Space cost: $O(n + m)$

Definition of an Undirected Tree

Tree $T$ : Graph with a unique simple path between every pair of vertices.
$n$ = # vertices, $n-1$ = # edges
Forest: Set of trees.

Definition of a Directed Tree

Directed Tree $T$ : Digraph with distinguished vertex root $r$ such that each vertex is reachable from $r$ by a unique path.
Family Relationships:
- Ancestors
- Descendants
- Parent
- Child
- Siblings
Leaves have no proper descendants.

An Ordered Tree

Ordered Tree: Is a directed tree with siblings ordered

Preorder Tree Traversal

Preorder: A, B, C, D, E, F, H, I
1. Root (order vertices as pushed on stack)
2. Preorder left subtree
3. Preorder right subtree

Postorder Tree Traversal

Postorder: B, E, D, H, I, F, C, A
1. Postorder left subtree
2. Postorder right subtree
3. Root (order vertices as popped off stack)

Spanning Tree and Forest of a Graph

$T$ is a spanning tree of graph $G$ if
1. $T$ is a directed tree with the same vertex set as $G$ .
2. Each edge of $T$ is a directed version of an edge of $G$ .
Spanning Forest: Forest of spanning trees of connected components of $G$ .

Example Spanning Tree of a Graph

Classifications of Edges of $G$ with Spanning Tree $T$
An edge $(u, v)$ of $T$ is a tree edge.
An edge $(u, v)$ of $G - T$ is a back edge if $u$ is a descendant or ancestor of $v$ .
Else $(u, v)$ is a cross edge (do not exist in undirected DFS).

Adjacency Matrix

An adjacency matrix for a graph with $n$ vertices numbered $0, 1, …, n-1$ is an $n \times n$ array matrix such that matrix $[i][j]$ is 1 (true) if there is an edge from vertex $i$ to vertex $j$ , and 0 (false) otherwise.
When the graph is weighted, we can let matrix $[i][j]$ be the weight that labels the edge from vertex $i$ to vertex $j$ , instead of simply 1, and let matrix $[i][j]$ equal to $\infty$ instead of 0 when there is no edge from vertex $i$ to vertex $j$ .
Adjacency matrix for an undirected graph is symmetrical, i.e., matrix $[i][j]$ is equal to matrix $[j][i]$ .
Space requirement O( $|V|^2$ ).
Acceptable if the graph is dense.

Adjacency Matrix – Examples

Directed Graph Adjacency Matrix
Undirected Weighted Graph Adjacency Matrix

Adjacency List

An adjacency list for a graph with $n$ vertices numbered $0, 1, …, n-1$ consists of $n$ linked lists. The $i^{th}$ linked list has a node for vertex $j$ if and only if the graph contains an edge from vertex $i$ to vertex $j$ .
Adjacency list is a better solution if the graph is sparse.
Space requirement is O( $|E| + |V|$ ), which is linear in the size of the graph.
In an undirected graph each edge $(v, w)$ appears in two lists, thus space requirement is doubled.

Adjacency List – Examples

Directed Graph Adjacency List
Undirected Weighted Graph Adjacency List

Adjacency Matrix vs Adjacency List

Two common graph operations:

Determine whether there is an edge from vertex $i$ to vertex $j$ .
Find all vertices adjacent to a given vertex $i$ .

An adjacency matrix supports operation 1 more efficiently.
An adjacency list supports operation 2 more efficiently.
An adjacency list often requires less space than an adjacency matrix.
- Adjacency Matrix: Space requirement is O( $|V|^2$ )
- Adjacency List: Space requirement is O( $|E| + |V|$ ), which is linear in the size of the graph.
- Adjacency matrix is better if the graph is dense (too many edges).
- Adjacency list is better if the graph is sparse (few edges).

Graph Traversals

A graph-traversal algorithm starts from a vertex $v$ , visits all of the vertices that can be reachable from the vertex $v$ .
A graph-traversal algorithm visits all vertices if and only if the graph is connected.
A connected component is the subset of vertices visited during a traversal algorithm that begins at a given vertex.
A graph-traversal algorithm must mark each vertex during a visit and must never visit a vertex more than once.
Thus, if a graph contains a cycle, the graph-traversal algorithm can avoid an infinite loop.
We look at two graph-traversal algorithms:
Depth-First Search (aka Depth-First Traversal)
Breadth-First Search (aka Breadth-First Traversal)

Depth-First Search

For a given vertex $v$ , the depth-first search algorithm proceeds along a path from $v$ as deeply into the graph as possible before backing up.
That is, after visiting a vertex $v$ , the depth-first search algorithm visits (if possible) an unvisited adjacent vertex to vertex $v$ .
The depth-first traversal algorithm does not completely specify the order in which it should visit the vertices adjacent to $v$ .
We may visit the vertices adjacent to $v$ in sorted order.

Depth-First Search – Example

A depth-first search of the graph starts from vertex $v$ .
Visit a vertex, then visit a vertex adjacent to that vertex.
If there is no unvisited vertex adjacent to a visited vertex, back up to the previous step.

Depth-First Search Algorithm

Depth-first search is a recursive algorithm for traversing a tree or graph data structure.
It is called depth-first search because it starts from the root node and follows each path to its greatest depth node before moving to the next path.
DFS uses a stack data structure for its implementation.
The process of the DFS algorithm is similar to the BFS algorithm.

Depth-First Search Algorithm Steps

If the initial state is a goal state, quit and return success.
Otherwise, do the following until success or failure is signaled:
- Generate a successor, $E$ , of the initial state. If there are no more successors, signal failure.
- Call Depth-First Search with $E$ as the initial state.
- If success is returned, signal success. Otherwise, continue in this loop.

Depth-First Search Example Steps

Illustrative steps of the DFS algorithm with a sample graph, showing node expansion and the evolution of the NODE-LIST.

Depth-First Search Algorithm - Analysis

Completeness: Complete within a finite state space as it will expand every node within a limited search tree.
Time Complexity: Will be equivalent to the node traversed by the algorithm. $T(n) = 1 + n^2 + n^3 + … + n^m = O(n^m)$ , where $m$ = maximum depth of any node and this can be much larger than $d$ (Shallowest solution depth).
Space Complexity: Needs to store only a single path from the root node, hence the space complexity of DFS is equivalent to the size of the fringe set, which is O( $b^m$ ).
Optimal: Non-optimal, as it may generate a large number of steps or high cost to reach the goal node.

Depth-First Search Algorithm - Advantages and Disadvantages

Advantages:

Requires less memory since only the nodes on the current path are stored.
May find a solution without examining much of the search space at all.

Disadvantages:

May find a sub-optimal solution (one that is deeper or more costly than the best solution).
Incomplete: Without a depth bound, one may not find a solution even if one exists.

Tarjan’s Depth First Search Algorithm

We assume a Random Access Machine (RAM) computational model
Algorithm Depth First Search
- Input graph $G = (V,E)$ represented by adjacency lists $Adj(v)$ for each $v \in V$
  1. $N \leftarrow 0$
  2. for all $v \in V$ do (number $(v) \leftarrow 0$ ; children $(v) \leftarrow ()$ )
  3. for all $v \in V$ do if number $(v) = 0$ then DFS $(v)$
  4. spanning forest defined by children if then output

Recursive DFS Procedure

The preorder numbers give the order the vertices are first visited in DFS.
DFS $(v)$
1. $N \leftarrow N + 1$ ; number $(v) \leftarrow N$
2. for each $u \in Adj(v)$ do if number $(u) = 0$ then (add $u$ to children $(v)$ ; DFS $(u)$ ) recursive procedure

Time Cost of Depth-First Search (DFS) Algorithm on a RAM

Input size $n = |V|, m = |E|$
Theorem: Depth-First Search takes linear time cost O( $n + m$ )
Proof: Can associate with each edge and vertex a constant number of operations.

Classification of Edges of $G$ via DFS Spanning Tree $T$

Edge notation induced by DFS Tree $T$

$u \rightarrow v$ iff $(u, v)$ is a tree edge of $T$
$u \rightarrow^* v$ iff $u$ is an ancestor of $v$
$u --- v$ iff $(u, v)$ is a backedge if $(u, v) \in G - T$ with either $u \rightarrow^* v or v \rightarrow^* u$

Search Strategies - Breadth-First Search Algorithm

The most common search strategy for traversing a tree or graph. This algorithm searches breadthwise in a tree or graph, so it is called breadth-first search.
BFS starts searching from the root node of the tree and expands all successor nodes at the current level before moving to nodes of next level.
An example of a general-graph search algorithm.
Implemented using FIFO queue data structure.

Breadth-First Search Algorithm Steps

Create a variable called NODE-LIST and set it to the initial state.
Until a goal state is found, or NODE-LIST is empty:

Remove the first element from NODE-LIST and call it E. If NODE-LIST was empty, then quit.
For element E do the following:
- Apply the rule to generate a new state
- If the new state is a goal state, quit and return this state
- Otherwise, add the new state to the end of NODE-LIST

Breadth-First Search Algorithm Function

function BREADTH-FIRST-SEARCH(problem) returns a solution node or failure
 node = NODE(problem.INITIAL)
 if problem.IS-GOAL(node.STATE) then return node
 frontier = a FIFO queue, with node as an element
 reached = {problem.INITIAL}
 while not IS-EMPTY(frontier) do
 node = POP(frontier)
 for each child in EXPAND(problem, node) do
 s = child.STATE
 if problem.IS-GOAL(s) then return child
 if s is not in reached then
 add s to reached
 add child to frontier
 return failure

function UNIFORM-COST-SEARCH(problem) returns a solution node, or failure
 return BEST-FIRST-SEARCH(problem, PATH-COST)

Breadth-First Search Example Steps

Illustrative steps of the BFS algorithm with a sample graph, showing node expansion and the evolution of the NODE-LIST.

Breadth-First Search Algorithm - Advantages and Disadvantages