Graph Theory and Network Analysis Notes

Basics on Graphs

Introduction to Graphs

  • Graphs model networks of various types, offering an abstract view of the domain being analyzed.
  • This allows for seeing the "overall picture" without getting bogged down in irrelevant details.
  • Graphs are also called networks in mathematical literature.

Components of a Graph

  • Vertices (Nodes): The fundamental units of a graph. In computer science, these are often called nodes or links. In physics may be called sites or bonds. In sociology may be called actors or ties.
  • Edges: Connections between vertices.

Types of Graphs

  • Directed Graphs: Edges have a direction, indicating a one-way relationship.
  • Undirected Graphs: Edges have no direction, indicating a two-way relationship.
  • Weighted Graphs: Edges have a strength or weight, usually represented by a real number.
  • One-mode Graphs: Edges form simple on/off connections between vertices of the same type.
    *Note that it is also possible to represent edges as having a strength or weight (weighted graph).

Examples of Networks Modeled by Graphs

  • Biological Networks:
    • Metabolic networks: metabolites and metabolic reactions.
    • Protein-interaction networks: proteins and bonding.
    • Gene regulatory networks: genes and regulatory effect.
    • Neuronal networks: neuron and synapse
    • Food webs: Specie and predation or resource transfer
  • Social Networks:
    • Friendship networks: person and friendship.
    • Sexual networks: person and intercourse.
  • Transportation Networks
    • Rail system: rail station and railroad tracks.
    • Road network: intersection and pavement.
    • Airport network: airport and non-stop flight
  • Informational Networks:
    • Internet: computer and IP network adjacency.
    • World Wide Web: web page and hyperlink.
  • Telecommunications Networks
    • Internet(2): autonomous system (ISP) and BGP connection.

Graph Representation

Adjacency Matrix

  • A (v1) Rows and columns represent the same set of nodes in one-mode networks.
  • Elements of the matrix are the weights in the case of weighted graphs.
  • A matrix is less intuitive than a picture.
  • It's difficult to see emergent social structure or the overall topology, but it can be used to measure them.
  • Expensive way to store data (N^2 elements for N nodes, since also the lack of a link is represented with a 0 value).
  • A{ij} = 1 if there is an edge from j to i. Otherwise, A{ij} = 0.

Edge List

  • Stores the edges: for each value > 0 in the adj matrix there is a row in the edge list.
  • Can be cumbersome for some mathematical reasoning.

Adjacency List

  • Similar to the edge list, but each node is listed as a row.

XML Files

  • Can be adopted to represent networks.
  • Graph ML (http://graphml.graphdrawing.org/)

Bipartite (Two-Mode) Graphs

  • Two kinds of vertices, no within-type edges.
  • Incidence (or affiliation) matrix (rectangular).
  • Often, the smaller set of nodes is arranged as columns, and the larger set as rows.
  • Examples:
    • authors & papers
    • actors & movies/scenes
    • musicians & albums
    • genes & substrings
    • people & online groups
    • words & documents
    • people & corporate boards
    • plants & pollinators
  • Two-mode graph may give the most complete representation of a given network.
  • One-mode projections showing connections between vertices of just one type are often convenient to work with.
  • Each group in the two-mode graph results in a cluster of vertices in the one-mode projection that are all connected to each other (a “clique” in network jargon).

Planar Graphs

  • Can be drawn so that no edges cross each other.
  • All trees are planar.
  • River networks are planar (rivers never cross; they only flow together).
  • Road networks are a good approximation of a planar network due to bridges enabling roads to meet without intersecting.

Paths in Graphs

  • A path is any sequence of vertices connected by edges.
  • The length of a path is the number of edges traversed along the path (number of “hops”).
  • Path lengths in a graph grow “slowly” with respect to the size N of the network; logarithmic growth (log N) is common.
  • A geodesic path (shortest path) is a path between two vertices such that no shorter paths exist.
  • Shortest paths are not necessarily unique; multiple paths of equal length may exist between a pair of vertices.
  • If two vertices are not connected, the shortest path can be considered infinite or omitted from the computation.

Graph Measures

  • Provide indicators to know:
    • The importance of a node or an area in the network.
    • The distance among nodes or areas in the network.
    • Cohesion degree of an area in the network.
  • Central individuals in a social network generally correspond to the influentials who provide value to the entire network.

Graph Eccentricity

  • Vertex-centric measure.
  • Computes the maximum distance of a specific vertex with respect to the other vertices within the graph.

Graph Diameter

  • Graph-centric measure.
  • Focuses on the entire graph and considers the farthest possible distance between any two vertices.
  • Represents the maximum eccentricity among all vertices in the graph.

Number of Paths

  • The number of paths of a given length n can be computed using the adjacency matrix.
  • An entry equal to 1 in A_{ij} represents a path of length 1 between i and j.
  • To compute the number of paths of length 2, the matrix of length 1 must be multiplied by itself, and the product matrix is the matrix representation of paths of length 2.
  • In general, to generate the matrix of paths of length n, take the matrix of paths of length n-1, and multiply it with the matrix of paths of length 1 (adjacency matrix).

Eulerian and Hamiltonian Paths

  • Eulerian path: A path that traverses each edge in a graph only once.
  • Hamiltonian path: A path that traverses each node in a graph exactly once.

Degree Centrality

  • The simplest centrality measure in a graph.
  • Applicable to directed and undirected graphs.
    • Indegree: Number of links entering node i.
    • Outdegree: Number of links leaving node i.
    • Degree: Number of links of node i.
  • Indicates the ability of a node to engage in a direct relationship with the other nodes.
  • Local measure computed from immediate connections.

Degree Centrality: Undirected Graph

  • Given the adjacency matrix of an undirected graph, the node degree is a sum over either the rows or the columns of the matrix.
  • k_i is the degree of node i.
  • The total number of links can be expressed as
  • Average degree

Degree Centrality: Directed Graph

  • ki = k{in} + k_{out}
  • Total number of links
    L = \sum{i=1}^{N} k{in} = \sum{i=1}^{N} k{out}
  • Average degree

Density of a Graph

  • The maximum number of possible edges in a simple undirected graph is \frac{1}{2} N (N-1).
  • The density \rho of a graph is the fraction of these edges actually present in the network.
  • \rho = \frac{L}{\frac{1}{2} N (N-1)} = \frac{2L}{N (N-1)} = \frac{}{N-1}
  • 0 <= \rho <= 1
  • If \rho \rightarrow constant when N \rightarrow \infty, the graph is dense.
  • If \rho \rightarrow 0 when N \rightarrow \infty, the graph is sparse.

Degree Distribution

  • p_k provides the probability that a randomly selected node in the graph has degree k.
  • For a graph with N nodes, the degree distribution is the normalized histogram, where N_k is the number of degree-k nodes.