Markov Chains: Applications, Weather Example, and Steady State

Technical setup and course context

The lecturer starts with a camera issue (one camera offline, one blurry) and fixes it by rebooting hardware; a light anecdote about technical disruptions.
The session introduces the week’s topic: applications of previous material, focusing on Markov chains and their real-world implications (e.g., Google’s PageRank).
Administrative notes: MATLAB quiz this week; no workshops; grading midterms; two weeks of teaching break; office hours available; students encouraged to seek help if behind;
The instructor offers an encouraging message: you are smart, and midterm outcomes can be redeemed by continued effort;
A few remarks on the general tone of the course and the instructor’s openness to discussion.
Preview: tomorrow’s topic includes the PageRank algorithm; the session frames Markov chains as probabilistic models for systems with a finite set of states.

Markov chains: basic setup and definitions

A Markov chain is a probabilistic model used for systems that evolve in steps with probabilistic transitions between a finite set of states.
Core components:
- A finite set of states, often denoted by S = {S, C, R} for weather: Sunny (S), Cloudy (C), Rainy (R).
- A transition mechanism described by a transition matrix T, encoding probabilities of moving from one state to another in one time step.
Markov (memoryless) property: the next state depends only on the current state, not on the history of past states.
Reading the transition matrix T:
- Columns correspond to the from-state (the current state).
- Rows correspond to the to-state (the next state).
- Entry T_{ij} is the probability of transitioning to state i given that we are in state j right now.
- Example (weather): if today is Rainy, the probabilities for tomorrow could be Sunny, Cloudy, Rainy as 1/4, 1/2, 1/4 respectively; this would place those values in the Rainy-column of T.
Column sums: each column sums to 1 because probabilities from a given current state must total 100%.
Row sums are not generally 1; a row gives how likely it is to end up in a particular to-state from all possible from-states, which need not sum to 1.
Die example to illustrate: a system with six states (the faces 1–6) and a transition matrix where every entry is 1/6 (every from-state can go to any to-state with probability 1/6). In this case, each column sums to 1, and all rows are identical.

Transition matrix, state evolution, and notation

State distribution at time k is a probability vector x_k with nonnegative entries summing to 1:
- x_k(i) = P(
  current state at time k = i)
Evolution across one step:
- $x{k+1} = T xk$
- This is a standard matrix multiplication: the current distribution xk is transformed by T to yield the next distribution x{k+1}.
Initial state vector x_0:
- Example: starting with Sunny only would be $x_0 = \begin{bmatrix}1 \ 0 \ 0\end{bmatrix}$ if S, C, R are ordered as 1, 2, 3.
- If starting from Rainy, x0 would be the unit vector corresponding to R, i.e. $x0 = \begin{bmatrix}0 \ 0 \ 1\end{bmatrix}$ .
Long-run evolution:
- After k steps: $xk = T^k x0$ .
- The analysis often focuses on the limit as k → ∞, i.e., the steady-state behavior of the chain.
Key long-term question: what will the distribution look like after many steps (e.g., after a week, a decade)?

The weather model in detail (example narrative)

The model uses three states: Sunny (S), Cloudy (C), Rainy (R).
Transition matrix T is a 3×3 matrix with outside-the-head interpretation: columns are from-states (S, C, R), rows are to-states (S, C, R).
An explicit example given: when it is Rainy today, tomorrow is Sunny with probability 1/4, Cloudy with probability 1/2, and Rainy with probability 1/4.
Implications:
- The column for Rainy is $\begin{bmatrix} \tfrac{1}{4} \\ \tfrac{1}{2} \\ \tfrac{1}{4} \end{bmatrix}$ .
- The first row (to-Sunny) sums to 1/2 in that example; this illustrates that row sums need not be 1.
In general, the evolution uses the rule: from current state j, the next-state distribution is given by the j-th column of T.
If the system is iterated, the distribution tends to a steady state q, i.e., a fixed distribution that does not change under T:
- $T q = q$ .
- In this weather example, long-run distributions stabilize: e.g., the steady-state vector may be $q = \begin{bmatrix} 0.2 \ 0.4 \ 0.4 \end{bmatrix}$ (sums to 1).
The phenomena observed over time (x1, x2, …) often show convergence toward a stable distribution, with differences between successive distributions shrinking as k grows.

Steady state, regularity, and long-term behavior

Regular stochastic matrix: a square matrix T where some power T^m has all entries positive (i.e., every state can be reached from every other state in some number of steps).
Consequences of regularity:
- There exists a unique steady-state vector q with $T q = q$ and $\mathbf{1}^T q = 1$ (where $\mathbf{1}^T = [1,1,1]$ ).
- For any initial distribution x_0, the distribution after many steps converges to q:
- $\lim{k \to \infty} T^k x0 = q$ , independent of x_0.
Interpretation: long-term behavior is determined by the transition probabilities rather than the starting state.
In the weather example, as k grows large, the distribution of weather types stabilizes at the steady-state q; the chain forgets its initial condition due to regularity.
Practical note: while such a simple Markov model is not a reliable short-term weather predictor, it can yield meaningful long-run averages and is used to motivate ideas like stationary distributions and drift toward equilibrium.
Short-term weather forecasting is typically more complicated and involves stiff numerical models; long-term Markov models abstract away fine dynamics to capture average behavior.

Finding and interpreting the steady-state vector

Stationary (steady-state) vector q satisfies:
- $T q = q$ and $\sumi qi = 1,$ i.e. $q \in \mathbb{R}^n, qi \ge 0, \sumi q_i = 1.$
Linear-algebraic approach to find q:
- Solve the homogeneous system $(T - I) q = 0$ with the normalization constraint $\mathbf{1}^T q = 1$ .
- Equivalently, find q in the null space of $T - I$ and then scale to sum to 1.
A common computational path:
- Compute a basis for Null $(T - I)$ , then pick the (unique) probability vector in that subspace by enforcing $\sumi qi = 1$ .
Worked illustration (from the transcript):
- If the given T is regular, the steady-state vector is unique and can be found by solving the above system.
- In the example discussed, the steady state was found to be $q = \begin{bmatrix} 0.2 \ 0.4 \ 0.4 \end{bmatrix}$ .
- This vector is an eigenvector of T associated with eigenvalue 1:
- $T q = q.$
Relationship to eigenvectors/vectors and eigenvalues:
- Steady states correspond to eigenvectors of T with eigenvalue 1 (right eigenvectors when x{k+1} = T xk).
- The discussion foreshooks eigenvectors/eigenvalues as a key topic for understanding Markov chains and many other linear-algebraic systems.

Practical takeaways and connections

Long-term independence from initial state:
- For a regular stochastic matrix, the limit distribution is the same regardless of where you start, provided the chain can reach every state (the well-mixed property).
What the model can and cannot tell you:
- It gives long-term averages and stationary distribution rather than precise short-term forecasts.
- The actual weather on a specific future day is not predicted exactly; rather, the probabilities of weather types stabilize.
Real-world relevance and extensions:
- Markov chains underpin PageRank (to be discussed in the next lecture).
- There are extensions to handle more complex real-world phenomena, including non-regular matrices, irreducible vs reducible chains, and aperiodicity concerns.
Terminology recap:
- Probability vector: A vector with nonnegative entries that sum to 1.
- Stochastic matrix: A square matrix whose columns are probability vectors (i.e., columns sum to 1).
- Regular matrix: A stochastic matrix whose some power has all entries nonzero (well-mixed, irreducible in the standard sense).
- Steady-state (equilibrium) vector: A probability vector q with $T q = q$ ; the long-run distribution of states.

Quick definitions and formulas to memorize

Probability vector: $p = [p1, p2, \ldots, pn]^T$ with $pi \ge 0$ and $\sum{i=1}^n pi = 1.$
Transition matrix: $T = [T_{ij}]$ with columns summing to 1:
- $\sum{i=1}^n T{ij} = 1\quad \text{for all } j.$
State evolution: $x{k+1} = T xk$ and $xk = T^k x0$ .
Steady state: $T q = q$ and $\mathbf{1}^T q = 1$ , i.e. $\sumi qi = 1.$
Null-space formulation: solve $(T - I) q = 0$ subject to the normalization constraint.
Long-run convergence: for regular stochastic matrix T,
- $\lim{k \to \infty} T^k x0 = q$ for any initial distribution x_0.

Preview of next topics and closing notes

Next lecture: deeper models and specific algorithms, including the PageRank algorithm used by Google.
The current discussion sets a foundation for understanding how local transition rules aggregate into global, long-run behavior.
The instructor invites questions and emphasizes that grasping these concepts builds intuition for more advanced topics in linear algebra and applied probability.

Connections to prior material and broader context

Link to previous lectures: change of coordinates and matrix invertibility themes tie into how we analyze and compute steady states using linear algebra.
Foundational principles: linear transformations, eigenvectors/eigenvalues, and the role of probability distributions in discrete state spaces.
Real-world relevance: Markov chains model a wide range of phenomena beyond weather (e.g., search ranking, genetics, queueing systems, economics).

Ethical, philosophical, and practical implications

Ethically: when modeling real-world phenomena, it is important to acknowledge modeling assumptions (memorylessness, fixed transition probabilities) and the limitations they impose.
Philosophically: Markov models embody a principle that long-run behavior can be dictated by local transition rules, sometimes independent of starting conditions, which raises questions about determinism vs randomness.
Practically: the choice of whether a Markov model is appropriate depends on the system’s memory, the availability of transition data, and the desired forecasting horizon.

Summary of key takeaways

Markov chain = finite states + transition matrix with memoryless transitions.
Transition matrix interpretation: columns = from-states; rows = to-states; entries give one-step transition probabilities.
State evolution: $x{k+1} = T xk$ ; long-run behavior explores the limit of $T^k x_0$ .
Regular stochastic matrices guarantee a unique steady-state distribution q and convergence $\lim{k\to\infty} T^k x0 = q$ , independent of the starting distribution.
The steady-state vector q satisfies $T q = q$ with normalization $\sumi qi = 1$ , and can be found by solving the linear system $(T - I) q = 0$ plus the normalization constraint.
In the weather example, the steady state was found to be $q = \begin{bmatrix} 0.2 \ 0.4 \ 0.4 \end{bmatrix}$ , illustrating a non-uniform long-run distribution.
The topic sets the stage for algorithms like PageRank and more advanced stochastic models.