Comprehensive Study Notes: Stochastic Processes and the Poisson Process

Chapter 4: The Poisson Process

The Poisson process is considered one of the most fundamental models in probability theory and stochastic processes. It provides a mathematical description for the occurrence of random events over time. Common examples include:

The arrival of customers at a service desk.
The emission of particles from a radioactive source.
Phone calls incoming to a call centre.

Two central ingredients underlie the Poisson process:

The number of arrivals in a given time interval follows a Poisson distribution.
The waiting time between successive arrivals is exponentially distributed.

4.1 The Exponential Distribution

The exponential distribution is the canonical model for the waiting time until the first occurrence of a random event that happens at a constant average rate.

Definition 4.1.1

A random variable $X : \Omega \rightarrow \mathbb{R}$ is said to follow an exponential distribution with parameter \lambda > 0 if:

$P(X \in [a, b]) = \int_a^b \lambda e^{-\lambda\beta} \, d\beta$

for all $0 \leq a \leq b$ . Equivalently, $X$ has the probability density function:

$f_X(x) = \lambda e^{-\lambda x}$

for all $x \geq 0$ . The notation used is $X \sim \exp(\lambda)$ .

Lemma 4.1.2: Properties of the Exponential Distribution

Let $X$ be exponentially distributed with parameter \lambda > 0. Then:

(a) Cumulative Distribution Function (CDF): $F_X(\beta) = \begin{cases} 1 - e^{-\lambda\beta} & \text{if } \beta \geq 0 \\ 0 & \text{else} \end{cases}$
(b) Expectation and Variance: $E[X] = \frac{1}{\lambda}$ $\text{Var}(X) = \frac{1}{\lambda^2}$
(c) Moment Generating Function (MGF): $G_X(\beta) := E[e^{\beta X}] = \frac{\lambda}{\lambda - \beta}$ for all \beta < \lambda.

Proof of Lemma 4.1.2:

(a) For $\beta \geq 0$ : $F_X(\beta) = \int_0^\beta f_X(\alpha) \, d\alpha = \int_0^\beta \lambda e^{-\lambda\alpha} \, d\alpha = 1 - e^{-\lambda\beta}$ .
(b) Integrating by parts with $u = \beta$ , $dv = \lambda e^{-\lambda\beta} d\beta$ : $E[X] = \int_0^\infty \beta \lambda e^{-\lambda\beta} \, d\beta = [-\beta e^{-\lambda\beta}]_0^\infty + \int_0^\infty e^{-\lambda\beta} \, d\beta = \frac{1}{\lambda}$ . Similarly, $E[X^2] = \int_0^\infty \beta^2 \lambda e^{-\lambda\beta} \, d\beta = \frac{2}{\lambda^2}$ . Thus, $\text{Var}(X) = \frac{2}{\lambda^2} - (\frac{1}{\lambda})^2 = \frac{1}{\lambda^2}$ .
(c) For \beta < \lambda: $G_X(\beta) = \int_0^\infty e^{\beta \alpha} \lambda e^{-\lambda \alpha} \, d\alpha = \lambda \int_0^\infty e^{-(\lambda-\beta)\alpha} \, d\alpha = \frac{\lambda}{\lambda - \beta}$ .

Theorem 4.1.3: Lack of Memory Property

An exponential random variable has a unique feature known as the lack-of-memory (or memoryless) property. Informally: "If we have already waited $t$ units of time, the probability of having to wait at least $s$ further units is the same as if no time had elapsed."

If $X \sim \exp(\lambda)$ , then for all $s, t \geq 0$ :

P(X > t + s | X > t) = P(X > s)

Proof: By Lemma 4.1.2, P(X > t) = 1 - F_X(t) = e^{-\lambda t}. Thus:

P(X > t + s | X > t) = \frac{P(X > t + s)}{P(X > t)} = \frac{e^{-\lambda(t+s)}}{e^{-\lambda t}} = e^{-\lambda s} = P(X > s).

Among all continuous probability distributions with a density, the exponential is the only one possessing this property. In the discrete setting, the geometric distribution is the unique distribution with this property.

Examples of the Exponential Distribution

Example 4.1.4 (Bank Session): Let $X$ denote the time (in minutes) a customer spends in a bank, assumed exponential with mean 10 minutes (implying $\lambda = \frac{1}{10}$ ).
- Probability spending more than 15 minutes: P(X > 15) = e^{-\lambda(15)} = e^{-3/2} \approx 0.220.
- Probability spending more than 15 minutes given the customer is still there after 10 minutes: P(X > 15 | X > 10) = P(X > 5) = e^{-\lambda(5)} = e^{-1/2} \approx 0.604.
Example 4.1.5 (Lightbulb Lifetime): Lifetime $X$ (in hours) is exponential with mean 10 ( $X \sim \exp(1/10)$ ). You find the bulb working. The probability it lasts another 5 hours is P(X > t + 5 | X > t) = P(X > 5) = e^{-5/10}. If $X$ were not exponential, the probability would depend on $t$ : $\frac{1 - F_X(t + 5)}{1 - F_X(t)}$ .

Mathematical Properties of Exponential Variables

Proposition 4.1.6 (Sum of Exponentials): Let $X_1, \dots, X_n$ be independent $\exp(\lambda)$ random variables. Then $S_n = X_1 + \dots + X_n$ follows a Gamma distribution with shape parameter $n$ and rate parameter $\lambda$ . The density is: $f_{S_n}(\beta) = \frac{\lambda^n}{(n-1)!} \beta^{n-1} e^{-\lambda \beta}$
Proposition 4.1.7 (Minimum of Exponentials): Let $X_1, \dots, X_n$ be independent $\exp(\lambda_k)$ variables. Then $M = \min(X_1, \dots, X_n)$ is exponentially distributed with parameter $\sum_{k=1}^n \lambda_k$ . Interpretation: If $n$ independent exponential clocks start with rates $\lambda_k$ , the first clock to ring occurs after an exponential time with the sum of the rates.
Proposition 4.1.8 (Competition of Exponentials): Let $X \sim \exp(\lambda)$ and $Y \sim \exp(\mu)$ be independent. Then: P(X < Y) = \frac{\lambda}{\lambda + \mu} Proof: Using joint density $f_{X,Y}(\alpha, \beta) = \lambda e^{-\lambda \alpha} \mu e^{-\mu \beta}$ , we compute: P(X < Y) = \int_0^\infty \int_\alpha^\infty \lambda e^{-\lambda \alpha} \mu e^{-\mu \beta} \, d\beta \, d\alpha = \int_0^\infty \lambda e^{-\lambda \alpha} e^{-\mu \alpha} \, d\alpha = \frac{\lambda}{\lambda + \mu}.

Application Examples

Example 4.1.9 (Post Office): Two clerks are busy, service times are $\exp(\lambda_1)$ and $\exp(\lambda_2)$ . You are served when one becomes available. Total time $T = \min(R_1, R_2) + S$ , where $R_i$ is remaining service time and $S$ is your own service time. Due to memorylessness, $R_i \sim \exp(\lambda_i)$ . Expected time: $E[T] = E[\min(R_1, R_2)] + E[S] = \frac{1}{\lambda_1 + \lambda_2} + \frac{2}{\lambda_1 + \lambda_2} = \frac{3}{\lambda_1 + \lambda_2}$ .
Example 4.1.10 (Competing Tasks): Check-up (mean 20 mins, $\lambda_1 = 1/20$ ) and Consultation (mean 30 mins, $\lambda_2 = 1/30$ ) start at the same time.
- Probability check-up finishes first: P(R_1 < R_2) = \frac{1/20}{1/20 + 1/30} = \frac{3}{5}.
- Expected time until both finish ( $T = \max(R_1, R_2)$ ): E[T] = E[\min(R_1, R_2)] + P(R_1 < R_2)E[R_2] + P(R_2 < R_1)E[R_1]. $E[T] = 12 + \frac{3}{5}(30) + \frac{2}{5}(20) = 38 \text{ minutes}$ .

4.2 The Poisson Distribution

The Poisson distribution models the number of events in a fixed interval of time or space occurring independently at a constant average rate. It is often described as the "law of rare events."

Definition 4.2.1

A random variable $X$ is Poisson distributed, written $X \sim \text{Poi}(\lambda)$ , if:

$P(X = k) = \frac{e^{-\lambda} \lambda^k}{k!}$

for all $k \in \mathbb{Z}_{\geq 0}$ . The parameter $\lambda$ represents the expected number of occurrences.

Proposition 4.2.2: Moments of Poisson

For $X \sim \text{Poi}(\lambda)$ , \lambda > 0:

(a) Factorial Moments: $E[X(X - 1) \dots (X - k + 1)] = \lambda^k$ . Specifically, $E[X] = \lambda$ .
(b) Variance: $\text{Var}(X) = \lambda$ .

Proof of (b): Since $E[X(X-1)] = \lambda^2$ , we have: $\text{Var}(X) = E[X(X-1)] + E[X] - (E[X])^2 = \lambda^2 + \lambda - \lambda^2 = \lambda$ .

Proposition 4.2.3: Sum of Independent Poisson

If $X \sim \text{Poi}(\lambda)$ and $Y \sim \text{Poi}(\rho)$ are independent, then $X + Y \sim \text{Poi}(\lambda + \rho)$ .

Proposition 4.2.4: Poisson as a Limit of Binomial

Let $X_n$ be Binomially distributed with parameter $n$ and $p_n = \lambda/n$ . As $n \rightarrow \infty$ :

$\lim_{n \rightarrow \infty} P(X_n = k) = \frac{e^{-\lambda} \lambda^k}{k!}$

This reveals that the Poisson distribution is appropriate for modeling many independent trials with small success probabilities ( $p$ ) but a fixed expected number of successes ( $np = \lambda$ ).

Example 4.2.5 (Approximation): For $n = 1000$ , $p_n = 0.002$ ( $\lambda = 2$ ). Comparing $X_n \sim \text{Bin}(1000, 0.002)$ and $Y \sim \text{Poi}(2)$ , the probabilities are nearly identical (absolute differences are $\approx 10^{-4}$ to $0$ ).

4.3 Poisson Process

A counting process $(N(t) : t \geq 0)$ models the arrival of events over time.

Definition 4.3.2: Counting Process

A stochastic process is a counting process if:

(a) $N(0) = 0$
(b) $N(t) \geq 0$ for all $t \geq 0$
(c) $N(s) \leq N(t)$ whenever $s \leq t$

Increment Properties

Independent Increments: Numbers of events in disjoint time intervals are independent random variables. (e.g., goals in the first half vs. second half).
Stationary Increments: The distribution of the number of events in an interval depends only on the length of the interval, not its position. (e.g., arrivals between 12-1 PM must be same as 3-4 PM).

Definition 4.3.4: Poisson Process

A counting process $N$ is a Poisson process with intensity \lambda > 0 if:

(a) $N(0) = 0$
(b) $N$ has independent and stationary increments.
(c) For every $t \geq 0$ , $N(t) \sim \text{Poi}(\lambda t)$ , specifically: $P(N(t) = k) = \frac{e^{-\lambda t} (\lambda t)^k}{k!}$ .

The parameter $\lambda$ represents the expected number of arrivals per unit time ( $E[N(t)] = \lambda t$ ).

Arrival and Waiting Times

Definition 4.3.5:
- $T_k = \inf\{t \geq 0 : N(t) = k\}$ is the $k$ -th arrival time.
- $\tau_k = T_k - T_{k-1}$ is the $k$ -th waiting time ( $T_0 := 0$ ).
Theorem 4.3.6:
- (a) Waiting times $\tau_1, \tau_2, \dots$ are i.i.d. $\exp(\lambda)$ .
- (b) Arrival time $T_k$ follows a Gamma distribution with parameters $(k, \lambda)$ , with density: $f_{T_k}(\beta) = \frac{\lambda^k}{(k-1)!} \beta^{k-1} e^{-\lambda \beta}$ .

Theorem 4.3.9: Equivalent Characterizations

A counting process $N$ is a Poisson process with intensity $\lambda$ if and only if:

It has independent and stationary increments, and for small $h$ , $P(N(h) = 1) = \lambda h + o(h)$ and $P(N(h) \geq 2) = o(h)$ .
It results from independent exponentially distributed waiting times $\tau_k$ .

Note on Little-o Notation (Definition 4.3.7): $f = o(g)$ as $\beta \rightarrow 0$ if $\lim_{\beta \rightarrow 0} \frac{f(\beta)}{g(\beta)} = 0$ .

Example: $e^h = 1 + h + o(h)$ .
This allows defining the Poisson process by its infinitesimal behavior over very short intervals.

4.4 Compound Poisson Processes

A compound Poisson process attaches a random quantity (jump size) to each arrival of a Poisson process.

Definition 4.4.2

Let $N(t)$ be a Poisson process with intensity $\lambda$ and $(Y_k)_{k=1}^\infty$ be an i.i.d. sequence of random variables independent of $N$ . The compound Poisson process $S(t)$ is:

$S(t) = \sum_{k=1}^{N(t)} Y_k$

(If $N(t)=0$ , then $S(t)=0$ ).

Theorem 4.4.4: Wald's Identity

If E[Y_1^2] < \infty, then for all $t \geq 0$ :

$E[S(t)] = \lambda t E[Y_1]$
$\text{Var}(S(t)) = \lambda t E[Y_1^2]$

Example 4.4.5 (Marine Ecologist): Crabs emerge at scale $\lambda = 3$ per hour. Weights $Y_k$ have mean 4 lbs and standard deviation 2 lbs. Over 2 hours:

$E[S(2)] = (3)(2)(4) = 24 \text{ lbs}$ .
$\text{Var}(S(2)) = (3)(2) E[Y_1^2] = (6)( \text{Var}(Y_1) + E[Y_1]^2 ) = 6(2^2 + 4^2) = 6(20) = 120 \text{ lbs}^2$ .
Standard deviation: $\sqrt{120}$ .

4.5 Combining and Thinning

Theorem 4.5.1: Thinning (Splitting)

If each arrival of a Poisson process with rate $\lambda$ is classified as type $j \in \{1, \dots, M\}$ with probability $p_j$ , then the processes $N_j(t)$ counting type- $j$ events are independent Poisson processes with intensities $p_j \lambda$ .

Example 4.5.2 (Fisherman): Fish bite at rate 2 per hour. 40% are salmon, 60% trout. Salmon process rate is $(0.4)(2) = 0.8$ . Trout process rate is $(0.6)(2) = 1.2$ . Probability of 1 salmon and 2 trout in 2.5 hours: $P(N_s(2.5)=1) \cdot P(N_t(2.5)=2) = \frac{e^{-2} 2^1}{1!} \cdot \frac{e^{-3} 3^2}{2!} = 9 e^{-5}$ .

Theorem 4.5.3: Superposition

The sum of $m$ independent Poisson processes with intensities $\lambda_1, \dots, \lambda_m$ is a Poisson process with intensity $\lambda = \sum_{k=1}^m \lambda_k$ .

Example 4.5.4 (Bus Stop): Routes A ( $\lambda_A$ ) and B ( $\lambda_B$ ).

Overall arrival rate: $\lambda_A + \lambda_B$ .
Expected wait for any bus: $\frac{1}{\lambda_A + \lambda_B}$ .
Probability next bus is A: $\frac{\lambda_A}{\lambda_A + \lambda_B}$ .
Expected A buses before the first B bus: Let $p = \lambda_A/(\lambda_A + \lambda_B)$ . The number of A labels before the first B is geometric with parameter $(1-p)$ , yielding mean $\frac{p}{1-p} = \frac{\lambda_A}{\lambda_B}$ .