Graph Theory and Network Analysis Notes

Graphs model networks of various types, offering an abstract view of the domain being analyzed.
This allows for seeing the "overall picture" without getting bogged down in irrelevant details.
Graphs are also called networks in mathematical literature.

Vertices (Nodes): The fundamental units of a graph. In computer science, these are often called nodes or links. In physics may be called sites or bonds. In sociology may be called actors or ties.
Edges: Connections between vertices.

Directed Graphs: Edges have a direction, indicating a one-way relationship.
Undirected Graphs: Edges have no direction, indicating a two-way relationship.
Weighted Graphs: Edges have a strength or weight, usually represented by a real number.
One-mode Graphs: Edges form simple on/off connections between vertices of the same type.
*Note that it is also possible to represent edges as having a strength or weight (weighted graph).

Biological Networks:
- Metabolic networks: metabolites and metabolic reactions.
- Protein-interaction networks: proteins and bonding.
- Gene regulatory networks: genes and regulatory effect.
- Neuronal networks: neuron and synapse
- Food webs: Specie and predation or resource transfer
Social Networks:
- Friendship networks: person and friendship.
- Sexual networks: person and intercourse.
Transportation Networks
- Rail system: rail station and railroad tracks.
- Road network: intersection and pavement.
- Airport network: airport and non-stop flight
Informational Networks:
- Internet: computer and IP network adjacency.
- World Wide Web: web page and hyperlink.
Telecommunications Networks
- Internet(2): autonomous system (ISP) and BGP connection.

A (v1) Rows and columns represent the same set of nodes in one-mode networks.
Elements of the matrix are the weights in the case of weighted graphs.
A matrix is less intuitive than a picture.
It's difficult to see emergent social structure or the overall topology, but it can be used to measure them.
Expensive way to store data ( $N^2$ elements for $N$ nodes, since also the lack of a link is represented with a 0 value).
$A{ij} = 1$ if there is an edge from $j$ to $i$ . Otherwise, $A{ij} = 0$ .

Stores the edges: for each value > 0 in the adj matrix there is a row in the edge list.
Can be cumbersome for some mathematical reasoning.

Two kinds of vertices, no within-type edges.
Incidence (or affiliation) matrix (rectangular).
Often, the smaller set of nodes is arranged as columns, and the larger set as rows.
Examples:
- authors & papers
- actors & movies/scenes
- musicians & albums
- genes & substrings
- people & online groups
- words & documents
- people & corporate boards
- plants & pollinators
Two-mode graph may give the most complete representation of a given network.
One-mode projections showing connections between vertices of just one type are often convenient to work with.
Each group in the two-mode graph results in a cluster of vertices in the one-mode projection that are all connected to each other (a “clique” in network jargon).

Can be drawn so that no edges cross each other.
All trees are planar.
River networks are planar (rivers never cross; they only flow together).
Road networks are a good approximation of a planar network due to bridges enabling roads to meet without intersecting.

A path is any sequence of vertices connected by edges.
The length of a path is the number of edges traversed along the path (number of “hops”).
Path lengths in a graph grow “slowly” with respect to the size $N$ of the network; logarithmic growth ( $log N$ ) is common.
A geodesic path (shortest path) is a path between two vertices such that no shorter paths exist.
Shortest paths are not necessarily unique; multiple paths of equal length may exist between a pair of vertices.
If two vertices are not connected, the shortest path can be considered infinite or omitted from the computation.

Provide indicators to know:
- The importance of a node or an area in the network.
- The distance among nodes or areas in the network.
- Cohesion degree of an area in the network.
Central individuals in a social network generally correspond to the influentials who provide value to the entire network.

Vertex-centric measure.
Computes the maximum distance of a specific vertex with respect to the other vertices within the graph.

Graph-centric measure.
Focuses on the entire graph and considers the farthest possible distance between any two vertices.
Represents the maximum eccentricity among all vertices in the graph.

The number of paths of a given length $n$ can be computed using the adjacency matrix.
An entry equal to 1 in $A_{ij}$ represents a path of length 1 between $i$ and $j$ .
To compute the number of paths of length 2, the matrix of length 1 must be multiplied by itself, and the product matrix is the matrix representation of paths of length 2.
In general, to generate the matrix of paths of length $n$ , take the matrix of paths of length $n-1$ , and multiply it with the matrix of paths of length 1 (adjacency matrix).

Eulerian path: A path that traverses each edge in a graph only once.
Hamiltonian path: A path that traverses each node in a graph exactly once.

The simplest centrality measure in a graph.
Applicable to directed and undirected graphs.
- Indegree: Number of links entering node $i$ .
- Outdegree: Number of links leaving node $i$ .
- Degree: Number of links of node $i$ .
Indicates the ability of a node to engage in a direct relationship with the other nodes.
Local measure computed from immediate connections.

Given the adjacency matrix of an undirected graph, the node degree is a sum over either the rows or the columns of the matrix.
$k_i$ is the degree of node $i$ .
The total number of links can be expressed as
Average degree

$ki = k{in} + k_{out}$
Total number of links
$L = \sum{i=1}^{N} k{in} = \sum{i=1}^{N} k{out}$
Average degree
$<k{in}> = \frac{1}{N} \sum{i=1}^{N} k{in} = <k{out}> = \frac{1}{N} \sum{i=1}^{N} k{out} = \frac{L}{N}$

The maximum number of possible edges in a simple undirected graph is $\frac{1}{2} N (N-1)$ .
The density $\rho$ of a graph is the fraction of these edges actually present in the network.
$\rho = \frac{L}{\frac{1}{2} N (N-1)} = \frac{2L}{N (N-1)} = \frac{<k>}{N-1}$
0 <= \rho <= 1
If $\rho \rightarrow constant$ when $N \rightarrow \infty$ , the graph is dense.
If $\rho \rightarrow 0$ when $N \rightarrow \infty$ , the graph is sparse.

$p_k$ provides the probability that a randomly selected node in the graph has degree $k$ .
For a graph with $N$ nodes, the degree distribution is the normalized histogram, where $N_k$ is the number of degree-k nodes.