Graph Theory and Network Analysis Notes
Basics on Graphs
Introduction to Graphs
- Graphs model networks of various types, offering an abstract view of the domain being analyzed.
- This allows for seeing the "overall picture" without getting bogged down in irrelevant details.
- Graphs are also called networks in mathematical literature.
Components of a Graph
- Vertices (Nodes): The fundamental units of a graph. In computer science, these are often called nodes or links. In physics may be called sites or bonds. In sociology may be called actors or ties.
- Edges: Connections between vertices.
Types of Graphs
- Directed Graphs: Edges have a direction, indicating a one-way relationship.
- Undirected Graphs: Edges have no direction, indicating a two-way relationship.
- Weighted Graphs: Edges have a strength or weight, usually represented by a real number.
- One-mode Graphs: Edges form simple on/off connections between vertices of the same type.
*Note that it is also possible to represent edges as having a strength or weight (weighted graph).
Examples of Networks Modeled by Graphs
- Biological Networks:
- Metabolic networks: metabolites and metabolic reactions.
- Protein-interaction networks: proteins and bonding.
- Gene regulatory networks: genes and regulatory effect.
- Neuronal networks: neuron and synapse
- Food webs: Specie and predation or resource transfer
- Social Networks:
- Friendship networks: person and friendship.
- Sexual networks: person and intercourse.
- Transportation Networks
- Rail system: rail station and railroad tracks.
- Road network: intersection and pavement.
- Airport network: airport and non-stop flight
- Informational Networks:
- Internet: computer and IP network adjacency.
- World Wide Web: web page and hyperlink.
- Telecommunications Networks
- Internet(2): autonomous system (ISP) and BGP connection.
Graph Representation
Adjacency Matrix
- A (v1) Rows and columns represent the same set of nodes in one-mode networks.
- Elements of the matrix are the weights in the case of weighted graphs.
- A matrix is less intuitive than a picture.
- It's difficult to see emergent social structure or the overall topology, but it can be used to measure them.
- Expensive way to store data (N^2 elements for N nodes, since also the lack of a link is represented with a 0 value).
- A{ij} = 1 if there is an edge from j to i. Otherwise, A{ij} = 0.
Edge List
- Stores the edges: for each value > 0 in the adj matrix there is a row in the edge list.
- Can be cumbersome for some mathematical reasoning.
Adjacency List
- Similar to the edge list, but each node is listed as a row.
XML Files
- Can be adopted to represent networks.
- Graph ML (http://graphml.graphdrawing.org/)
Bipartite (Two-Mode) Graphs
- Two kinds of vertices, no within-type edges.
- Incidence (or affiliation) matrix (rectangular).
- Often, the smaller set of nodes is arranged as columns, and the larger set as rows.
- Examples:
- authors & papers
- actors & movies/scenes
- musicians & albums
- genes & substrings
- people & online groups
- words & documents
- people & corporate boards
- plants & pollinators
- Two-mode graph may give the most complete representation of a given network.
- One-mode projections showing connections between vertices of just one type are often convenient to work with.
- Each group in the two-mode graph results in a cluster of vertices in the one-mode projection that are all connected to each other (a “clique” in network jargon).
Planar Graphs
- Can be drawn so that no edges cross each other.
- All trees are planar.
- River networks are planar (rivers never cross; they only flow together).
- Road networks are a good approximation of a planar network due to bridges enabling roads to meet without intersecting.
Paths in Graphs
- A path is any sequence of vertices connected by edges.
- The length of a path is the number of edges traversed along the path (number of “hops”).
- Path lengths in a graph grow “slowly” with respect to the size N of the network; logarithmic growth (log N) is common.
- A geodesic path (shortest path) is a path between two vertices such that no shorter paths exist.
- Shortest paths are not necessarily unique; multiple paths of equal length may exist between a pair of vertices.
- If two vertices are not connected, the shortest path can be considered infinite or omitted from the computation.
Graph Measures
- Provide indicators to know:
- The importance of a node or an area in the network.
- The distance among nodes or areas in the network.
- Cohesion degree of an area in the network.
- Central individuals in a social network generally correspond to the influentials who provide value to the entire network.
Graph Eccentricity
- Vertex-centric measure.
- Computes the maximum distance of a specific vertex with respect to the other vertices within the graph.
Graph Diameter
- Graph-centric measure.
- Focuses on the entire graph and considers the farthest possible distance between any two vertices.
- Represents the maximum eccentricity among all vertices in the graph.
Number of Paths
- The number of paths of a given length n can be computed using the adjacency matrix.
- An entry equal to 1 in A_{ij} represents a path of length 1 between i and j.
- To compute the number of paths of length 2, the matrix of length 1 must be multiplied by itself, and the product matrix is the matrix representation of paths of length 2.
- In general, to generate the matrix of paths of length n, take the matrix of paths of length n-1, and multiply it with the matrix of paths of length 1 (adjacency matrix).
Eulerian and Hamiltonian Paths
- Eulerian path: A path that traverses each edge in a graph only once.
- Hamiltonian path: A path that traverses each node in a graph exactly once.
Degree Centrality
- The simplest centrality measure in a graph.
- Applicable to directed and undirected graphs.
- Indegree: Number of links entering node i.
- Outdegree: Number of links leaving node i.
- Degree: Number of links of node i.
- Indicates the ability of a node to engage in a direct relationship with the other nodes.
- Local measure computed from immediate connections.
Degree Centrality: Undirected Graph
- Given the adjacency matrix of an undirected graph, the node degree is a sum over either the rows or the columns of the matrix.
- k_i is the degree of node i.
- The total number of links can be expressed as
- Average degree
Degree Centrality: Directed Graph
- ki = k{in} + k_{out}
- Total number of links
L = \sum{i=1}^{N} k{in} = \sum{i=1}^{N} k{out} - Average degree
Density of a Graph
- The maximum number of possible edges in a simple undirected graph is \frac{1}{2} N (N-1).
- The density \rho of a graph is the fraction of these edges actually present in the network.
- \rho = \frac{L}{\frac{1}{2} N (N-1)} = \frac{2L}{N (N-1)} = \frac{}{N-1}
- 0 <= \rho <= 1
- If \rho \rightarrow constant when N \rightarrow \infty, the graph is dense.
- If \rho \rightarrow 0 when N \rightarrow \infty, the graph is sparse.
Degree Distribution
- p_k provides the probability that a randomly selected node in the graph has degree k.
- For a graph with N nodes, the degree distribution is the normalized histogram, where N_k is the number of degree-k nodes.