Study Notes on Random Variables and Their Distributions

Chapter 4: Distributions of Random Variables

Author: Leo Zexian Wang

Random Variables

Definition: A random variable (r.v.) is a numeric quantity that takes different values with specified probabilities.
Types of Random Variables:
- Discrete Random Variable: Takes values from a discrete set (countable values).
- Examples can be either countably finite (like the number of students in a classroom) or countably infinite (such as counting the number of trials until a success).
- Continuous Random Variable: Takes values from a continuous range (e.g., any value in an interval).

Discrete Random Variables

Probability Mass Function (pmf): Denoted as $f(x) = P(X = x)$ , assigns a probability to each possible value $x$ in support $X$ .
- The sum of probabilities must equal 1: $extstyle orall x ext{ in } X: \sum_{x ext{ in } X} f(x) = 1$ .
- Example:
- Let $X$ be the discrete r.v. denoting the number of heads in two successive tosses of a fair coin.
- The sample space: $ext{Ω} = ext{TT, TH, HT, HH}$
  - $P(X=0)=P({TT})=\frac14$
  - $P(X=1)=P({TH},{HT})=\frac12$
  - $P(X=2)=P({HH})=\frac14$
- Support: $X={{0, 1, 2}}$ .
- Cumulative Distribution Function (cdf): Denoted as $F(x) = P(X \leq x)$ , describes the probability that $X$ is less than or equal to $x$ . F(x) is a non-decreasing function from 0 to 1.
- Example cdf:
- $F(x) = \begin{cases} 0 & x < 0 \ 0.25 & 0 \leq x < 1 \ 0.75 & 1 \leq x < 2 \ 1 & x \geq 2 \end{cases}$

Expected Value and Variance of Discrete Random Variables

Expected Value $E(X)$ :
- $E(X) = \sum_{x \in X} x P(X = x)$
- Example: For the coin toss, $E(X) = 0 \times \frac{1}{4} + 1 \times \frac{1}{2} + 2 \times \frac{1}{4} = 1$
Variance $Var(X)$ :
- $Var(X) = E((X - \mu)^2) = \sum_{x \in X} (x - \mu)^2P(X = x)$
- This can also be expressed as:
  - $Var(X) = E(X^2) - (E(X))^2$
- Example from previous scenario:
- $E(X^2) = 0^2 \times \frac{1}{4} + 1^2 \times \frac{1}{2} + 2^2 \times \frac{1}{4} = \frac{3}{2}$
- Thus, $Var(X) = \frac{3}{2} - 1^2 = \frac{1}{2}$
Standard Deviation $SD(X)$ :
- Given by $SD(X) = \sqrt{Var(X)} = \sqrt{\frac{1}{2}}$

Continuous Random Variables

Probability Density Function (pdf): Denoted as $f(x)$ ; the area under the curve between any two points $a$ and $b$ equals the probability that $X$ falls between them:
- $P(a \leq X \leq b) = \int_a^b f(x) \,dx$
- Total area under the curve must equal 1: $P(\Omega) = \int_{x \in X} f(x) \,dx = 1$
Example:
- Let $X$ be the time it takes for a bus to arrive, following a uniform distribution:
- Support: $X = [10, 15]$
  - $f(x) = \begin{cases} \frac{1}{5} & 10 \leq x \leq 15 \ 0 & \text{elsewhere} \end{cases}$
- CDF:
  - $F(x) = \int_{10}^x f(y) \, dy$
  - $F(x) = \begin{cases} 0 & x < 10 \ \frac{x - 10}{5} & 10 \leq x \leq 15 \ 1 & x > 15 \end{cases}$

Expected Value and Variance of Continuous Random Variables

Expected Value $E(X)$ :
- $E(X) = \int_{x \in X} x f(x) \,dx$
- Substitute values for our example:
- $E(X) = \int_{10}^{15} x \cdot \frac{1}{5} \,dx = \frac{1}{5} \left[\frac{x^2}{2}\right]_{10}^{15} = 12.5$
Variance $Var(X)$ :
- $Var(X) = E(X^2) - (E(X))^2$
- Example calculation:
- $E(X^2) = \int_{10}^{15} x^2 \cdot \frac{1}{5} \,dx = 25$
- Variance is thus calculated as: $Var(X) = 25 - (12.5)^2 = \frac{25}{12}$
- Standard deviation: $SD(X) = \sqrt{Var(X)} = \frac{5}{2\sqrt{3}} \approx 1.4434$

Common Distributions of Discrete Random Variables

Discrete Uniform Distribution:
- Each outcome is equally likely.
Binomial Distribution:
- Models the number of successes in $n$ independent trials (e.g., flipping a coin).
- Parameters: number of trials $n$ and probability of success $p$ .
Geometric Distribution:
- Models the number of trials until the first success occurs.
Poisson Distribution:
- Models the number of events occurring in a fixed interval of time or space, given a known average rate of occurrence.

Common Distributions of Continuous Random Variables

Continuous Uniform Distribution:
- All intervals of the same length are equally probable.
Normal Distribution:
- Bell-shaped curve symmetric about the mean.
- Standard normal distribution: $N(0, 1)$ has mean 0 and variance 1.
Student’s t Distribution:
- Similar to normal distribution but with heavier tails.
Chi-square Distribution:
- Represents the sum of squared standard normal variables.
Exponential Distribution:
- Models time until an event occurs (like waiting for a bus).

Normal Distribution

Definition: A continuous distribution characterized by its mean 0;1;\sigma (standard deviation).
Properties:
- Mean = Median = Mode.
- $f(x) = \frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{(x - \mu)^2}{2\sigma^2}}$ approaches but never touches the x-axis.
- Changing $\mu$ shifts the curve left or right, and changing $\sigma$ affects the spread.
Standardization: For any normal random variable $X$ , transform to a standard normal variable $Z$ using:
- $Z = \frac{X - \mu}{\sigma}$

Calculating Probabilities with Standardization

Example: Absorption rate of cones in the eye follows a normal distribution.
Given mean of 535 nm and standard deviation of 65 nm, calculate proportion absorbing wavelengths between 550 nm and 575 nm.
- P(550 < X < 575) = P(X < 575) - P(X < 550)
- Compute z-scores: $Z = \frac{X - \mu}{\sigma}$
- Result: P(Z < 0.62) - P(Z < 0.23) = 0.1414

Z-Table and Finding Probabilities

Z-Table values represent cumulative probabilities for corresponding z-scores.
- Example: To find P(Z < 1.26) identify the area to the left of z = 1.26 in the table (0.8962).
- To Find P(Z > 1.26) = 1 - P(Z < 1.26) = 0.1038.

Percentiles and Their Calculation

Finding Percentiles:
- To find height in top 10% (e.g., for female heights with mean of 64 inches, sd of 2.5 inches):
- Determine the corresponding z-score for the cumulative probability: $z = 1.28$
- Height: $x = z\sigma + \mu = 1.28(2.5) + 64 = 67.2$
Example for Hummingbirds:
- Given weight distribution with mean of 13g and SD of 3.4g, find weight less than 65% of all hummingbirds (35th percentile). Solve with z-score substitution:
- $-0.39$ yields weight = $11.674g$

The Empirical Rule

Definitions:
- Approximately 68% of values lie within one standard deviation of the mean.
- Approximately 95% of values lie within two standard deviations.
- Approximately 99.7% of values lie within three standard deviations.
Application to ITBS scores:
- Questions about the proportion of scores within a given range can be answered using the Empirical Rule.

Binomial Distribution Assumptions

Parameterization: Binomial(n, p) where $n$ is the number of independent trials, and $p$ is the probability of success.
Example: With a 5% chance of broken eggs in a dozen:
- $P(X = x) = {n\choose x} p^x (1-p)^{n-x}$

Geometric and Poisson Distributions

Geometric Distribution: Models trials until the first success. Repeats count as trials.
Poisson Distribution: Models the number of occurrences in a fixed time/space.
Important definitions and examples include rates (e.g. phone calls/hour) and counts (e.g. bacteria in a sample).

Memoryless Property

The condition where future probabilities do not depend on past events is fulfilled by both exponential and geometric distributions.

Student’s t Distribution

Definition: Characterized by degrees of freedom, $\nu$ .
- As $\nu$ decreases, it becomes more heavy-tailed; approaches normal distribution as $\nu \to \infty$ .

Functions and R Commands

Practical use in R for generating random variables, calculating mean, probabilities, and densities of distributions. Examples include:
- Generate normal random variables: rnorm(n=10, mean =0, sd=1).
- Find probabilities using functions like pnorm, and density with dnorm.

Bivariate Distributions

Understanding multi-variable distribution via joint probability mass and density functions and calculating conditional probabilities.
- Covariance and correlation metrics provide understanding of relationships between random variables.

Bivariate Normal Distribution

Joint distributions of two continuous variables, characterized by specific means, variances, and correlations.