Graph Theory and Network Analysis Notes

Basics on Graphs

Introduction to Graphs

  • Graphs model networks of various types, offering an abstract view of the domain being analyzed.
  • This allows for seeing the "overall picture" without getting bogged down in irrelevant details.
  • Graphs are also called networks in mathematical literature.

Components of a Graph

  • Vertices (Nodes): The fundamental units of a graph. In computer science, these are often called nodes or links. In physics may be called sites or bonds. In sociology may be called actors or ties.
  • Edges: Connections between vertices.

Types of Graphs

  • Directed Graphs: Edges have a direction, indicating a one-way relationship.
  • Undirected Graphs: Edges have no direction, indicating a two-way relationship.
  • Weighted Graphs: Edges have a strength or weight, usually represented by a real number.
  • One-mode Graphs: Edges form simple on/off connections between vertices of the same type.
    *Note that it is also possible to represent edges as having a strength or weight (weighted graph).

Examples of Networks Modeled by Graphs

  • Biological Networks:
    • Metabolic networks: metabolites and metabolic reactions.
    • Protein-interaction networks: proteins and bonding.
    • Gene regulatory networks: genes and regulatory effect.
    • Neuronal networks: neuron and synapse
    • Food webs: Specie and predation or resource transfer
  • Social Networks:
    • Friendship networks: person and friendship.
    • Sexual networks: person and intercourse.
  • Transportation Networks
    • Rail system: rail station and railroad tracks.
    • Road network: intersection and pavement.
    • Airport network: airport and non-stop flight
  • Informational Networks:
    • Internet: computer and IP network adjacency.
    • World Wide Web: web page and hyperlink.
  • Telecommunications Networks
    • Internet(2): autonomous system (ISP) and BGP connection.

Graph Representation

Adjacency Matrix
  • A (v1) Rows and columns represent the same set of nodes in one-mode networks.
  • Elements of the matrix are the weights in the case of weighted graphs.
  • A matrix is less intuitive than a picture.
  • It's difficult to see emergent social structure or the overall topology, but it can be used to measure them.
  • Expensive way to store data (N2N^2 elements for NN nodes, since also the lack of a link is represented with a 0 value).
  • A<em>ij=1A<em>{ij} = 1 if there is an edge from jj to ii. Otherwise, A</em>ij=0A</em>{ij} = 0.
Edge List
  • Stores the edges: for each value > 0 in the adj matrix there is a row in the edge list.
  • Can be cumbersome for some mathematical reasoning.
Adjacency List
  • Similar to the edge list, but each node is listed as a row.
XML Files
  • Can be adopted to represent networks.
  • Graph ML (http://graphml.graphdrawing.org/)

Bipartite (Two-Mode) Graphs

  • Two kinds of vertices, no within-type edges.
  • Incidence (or affiliation) matrix (rectangular).
  • Often, the smaller set of nodes is arranged as columns, and the larger set as rows.
  • Examples:
    • authors & papers
    • actors & movies/scenes
    • musicians & albums
    • genes & substrings
    • people & online groups
    • words & documents
    • people & corporate boards
    • plants & pollinators
  • Two-mode graph may give the most complete representation of a given network.
  • One-mode projections showing connections between vertices of just one type are often convenient to work with.
  • Each group in the two-mode graph results in a cluster of vertices in the one-mode projection that are all connected to each other (a “clique” in network jargon).

Planar Graphs

  • Can be drawn so that no edges cross each other.
  • All trees are planar.
  • River networks are planar (rivers never cross; they only flow together).
  • Road networks are a good approximation of a planar network due to bridges enabling roads to meet without intersecting.

Paths in Graphs

  • A path is any sequence of vertices connected by edges.
  • The length of a path is the number of edges traversed along the path (number of “hops”).
  • Path lengths in a graph grow “slowly” with respect to the size NN of the network; logarithmic growth (logNlog N) is common.
  • A geodesic path (shortest path) is a path between two vertices such that no shorter paths exist.
  • Shortest paths are not necessarily unique; multiple paths of equal length may exist between a pair of vertices.
  • If two vertices are not connected, the shortest path can be considered infinite or omitted from the computation.

Graph Measures

  • Provide indicators to know:
    • The importance of a node or an area in the network.
    • The distance among nodes or areas in the network.
    • Cohesion degree of an area in the network.
  • Central individuals in a social network generally correspond to the influentials who provide value to the entire network.
Graph Eccentricity
  • Vertex-centric measure.
  • Computes the maximum distance of a specific vertex with respect to the other vertices within the graph.
Graph Diameter
  • Graph-centric measure.
  • Focuses on the entire graph and considers the farthest possible distance between any two vertices.
  • Represents the maximum eccentricity among all vertices in the graph.
Number of Paths
  • The number of paths of a given length nn can be computed using the adjacency matrix.
  • An entry equal to 1 in AijA_{ij} represents a path of length 1 between ii and jj.
  • To compute the number of paths of length 2, the matrix of length 1 must be multiplied by itself, and the product matrix is the matrix representation of paths of length 2.
  • In general, to generate the matrix of paths of length nn, take the matrix of paths of length n1n-1, and multiply it with the matrix of paths of length 1 (adjacency matrix).
Eulerian and Hamiltonian Paths
  • Eulerian path: A path that traverses each edge in a graph only once.
  • Hamiltonian path: A path that traverses each node in a graph exactly once.

Degree Centrality

  • The simplest centrality measure in a graph.
  • Applicable to directed and undirected graphs.
    • Indegree: Number of links entering node ii.
    • Outdegree: Number of links leaving node ii.
    • Degree: Number of links of node ii.
  • Indicates the ability of a node to engage in a direct relationship with the other nodes.
  • Local measure computed from immediate connections.
Degree Centrality: Undirected Graph
  • Given the adjacency matrix of an undirected graph, the node degree is a sum over either the rows or the columns of the matrix.
  • kik_i is the degree of node ii.
  • The total number of links can be expressed as
  • Average degree
Degree Centrality: Directed Graph
  • k<em>i=k</em>in+koutk<em>i = k</em>{in} + k_{out}
  • Total number of links
    L=<em>i=1Nk</em>in=<em>i=1Nk</em>outL = \sum<em>{i=1}^{N} k</em>{in} = \sum<em>{i=1}^{N} k</em>{out}
  • Average degree
    <k<em>in>=1N</em>i=1Nk<em>in=<k</em>out>=1N<em>i=1Nk</em>out=LN<k<em>{in}> = \frac{1}{N} \sum</em>{i=1}^{N} k<em>{in} = <k</em>{out}> = \frac{1}{N} \sum<em>{i=1}^{N} k</em>{out} = \frac{L}{N}

Density of a Graph

  • The maximum number of possible edges in a simple undirected graph is 12N(N1)\frac{1}{2} N (N-1).
  • The density ρ\rho of a graph is the fraction of these edges actually present in the network.
  • ρ=L12N(N1)=2LN(N1)=<k>N1\rho = \frac{L}{\frac{1}{2} N (N-1)} = \frac{2L}{N (N-1)} = \frac{<k>}{N-1}
  • 0 <= \rho <= 1
  • If ρconstant\rho \rightarrow constant when NN \rightarrow \infty, the graph is dense.
  • If ρ0\rho \rightarrow 0 when NN \rightarrow \infty, the graph is sparse.

Degree Distribution

  • pkp_k provides the probability that a randomly selected node in the graph has degree kk.
  • For a graph with NN nodes, the degree distribution is the normalized histogram, where NkN_k is the number of degree-k nodes.