Comprehensive Study Notes: Probability and the Sampling Distribution of the Sample Mean

Variability and Randomness

Statistics is fundamentally the study of variability.
Uncertainty is managed by investigating random behavior.
Random behavior is characterized by two distinct patterns relative to time:
- Short-run: Outcomes are unpredictable and appear haphazard.
- Long-run: Outcomes exhibit a regular, predictable distribution.
A phenomenon is defined as random if individual outcomes are uncertain, but a stable distribution of outcomes emerges over a large number of repetitions.
A random experiment is defined as any process or activity involving uncertainty that results in two or more possible outcomes.
In everyday language, "randomness" is often equated with chaos or haphazard events because we often do not observe the phenomenon enough times to perceive the emerging long-run pattern.

Understanding Probability

The foundation of probability lies in the fact that regular patterns emerge only after many repeated trials (e.g., rolling dice, tossing coins, or lottery outcomes).
The Coin Toss Experiment:
- Assuming a fair coin, the likelihood of observing a Head is equal to observing a Tail (50% chance each).
- In a sequence like $H, T, T, H, T, H$ , the observed proportions of Heads are $1.0, 0.5, 0.33, 0.5, 0.4, 0.5$ .
- Proportions vary significantly in the early stages, but in the long-run, the proportion of Heads will consistently stay very close to $0.5$ .
- Empirical Threshold: If a fair coin is tossed $10,000$ times, it is almost certain that one will observe between $4,800$ and $5,200$ Heads.
Definition of Probability: The probability of any outcome of a random phenomenon is the proportion of times that specific outcome would occur in an infinitely long series of trials.
Probability Theory: This branch of mathematics describes random behavior using mathematical models. Because we cannot perform an experiment infinite times, we use models to describe what would happen theoretically.

Proportions vs. Probability

Proportion: A value that is known or has been observed. It is spoken of in the present tense.
Probability: A theoretical value representing the proportion after an infinitely long series of trials. It relates to future events.

Probability Models and Sample Spaces

A probability model consists of two components:
1. A list of all possible outcomes.
2. A probability assigned to each outcome.
The Sample Space ( $S$ ): The set of all possible outcomes for a random phenomenon.
- Simple Examples:
  - Tossing a coin once: $S = \{H, T\}$ .
  - Tossing a coin three times: $S = \{HHH, HHT, HTH, HTT, THH, THT, TTH, TTT\}$ .
- Complex Examples:
  - Lotto 6/49: Choosing six numbers from 49 leads to nearly $14,000,000$ possible combinations.
  - Sports (Valour FC soccer): Considering the next two games ( $W$ = Win, $T$ = Tie, $L$ = Loss), the order matters. $S = \{WW, WT, WL, TW, TT, TL, LW, LT, LL\}$ . Note that Winning first then Losing ( $WL$ ) is distinct from Losing first then Winning ( $LW$ ).
  - Rolling Two Dice: The sample space contains 36 outcomes ( $11, 12, \dots, 66$ ). If the variable of interest is the sum of the two dice, then $S = \{2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12\}$ .

Rules and Probability Distributions

For a sample space $S = \{O_1, O_2, \dots, O_n\}$ , let the probability of individual outcome $O_i$ be denoted as $p_i$ .
Fundamental Conditions:
1. Each individual probability must be between 0 and 1: $0 \le p_i \le 1$ for all $i = 1, 2, \dots, n$ .
2. The sum of all probabilities must equal exactly 1: $p_1 + p_2 + \dots + p_n = 1$ .
Events:
- An event is a subset of outcomes from the sample space.
- Example (Rolling two dice): If event $A$ is "At Least One 4", then $A = \{14, 24, 34, 41, 42, 43, 44, 45, 46, 54, 64\}$ . If event $B$ is "Sum is 9", then $B = \{36, 45, 54, 63\}$ .
Probability of Events: Calculated by adding the probabilities of all individual outcomes contained within that event.
- Example: P(\text{Sum of dice } > 8) = P(X=9) + P(X=10) + P(X=11) + P(X=12) = \frac{4}{36} + \frac{3}{36} + \frac{2}{36} + \frac{1}{36} = \frac{10}{36} \approx 0.2778.
Complements ( $A^c$ ): The event containing all outcomes in the sample space not found in $A$ .
- Rule: $P(A^c) = 1 - P(A)$ .
- Examples: $P(\text{Win}) = 1 - P(\text{Tie or Lose})$ ; $P(\text{Sum } \ge 5) = 1 - P(\text{Sum } \le 4)$ .
Probability Distribution: A table or rule that provides all possible values of a variable and the specific probability for each value.

Random Variables and Calculations

A random variable ( $X$ ) provides a numerical description of the outcome of a statistical experiment.
Case Study: NHL Atlantic Division Division Winner:
- Teams: Montreal, Ottawa, Toronto, and others.
- Probabilities: Montreal ( $k$ , Ottawa ( $0.05$ ), Toronto ( $4k$ ), Team 4 ( $2k$ ), Team 5 ( $0.02$ ), Team 6 ( $0.04$ ), Team 7 ( $0.29$ ), Team 8 ( $3k$ ).
- Calculation for $k$ : $k + 0.05 + 4k + 2k + 0.02 + 0.04 + 0.29 + 3k = 1 \Rightarrow 10k + 0.40 = 1 \Rightarrow 10k = 0.60 \Rightarrow k = 0.06$ .
- Probability a Canadian team wins: $P(\text{Montreal}) + P(\text{Ottawa}) + P(\text{Toronto}) = 0.06 + 0.05 + 4(0.06) = 0.35$ .
Case Study: NHL Pacific Division (Complementary Logic):
- Probabilities provided: Calgary ( $0.08$ ), Edmonton ( $0.27$ ), Vancouver ( $0.09$ ). Others are incomplete.
- Probability an American team wins: $1 - P(\text{Canadian team wins}) = 1 - (0.08 + 0.27 + 0.09) = 1 - 0.44 = 0.56$ .

Continuous Random Variables

While discrete random variables (like dice sums or coin counts) take only certain values, continuous random variables can take any value in an interval.
Sample Space Example: Time for a light bulb to burn out, $S = \{\text{all values of } x \text{ such that } x \ge 0\}$ .
Probability Assignment: Because there are infinitely many outcomes, probabilities are assigned to intervals of values rather than individual points.
Density Curves: The area under a density curve represents the probability of observing an outcome in that interval.
Normal Probability Distribution: Denoted as $X \sim N(\mu, \sigma)$ , where probabilities correspond to the area under the Normal curve.
Example (Pulse Rates): Adult females have pulse rates with $\mu = 74\,\text{bpm}$ and $\sigma = 12\,\text{bpm}$ .
- P(X > 57) = P\left(Z > \frac{57 - 74}{12}\right) = P(Z > -1.42) = 1 - P(Z < -1.42) = 1 - 0.0778 = 0.9222.

The Sampling Distribution of the Sample Mean ($\bar{X}$)

Instead of observing single individuals, researchers often take a Random Sample of size $n$ and calculate the sample mean ( $\bar{x}$ ).
The Sampling Distribution of a Statistic is the distribution of values taken by that statistic in all possible samples of the same size from the same population.
Conceptual Experiment:
- Repeatedly take samples of size $n$ from a population with mean $\mu$ and standard deviation $\sigma$ .
- Calculate $\bar{x}$ for each sample.
- Plot the histogram of $\bar{x}$ values.
Key Characteristics of the Distribution of $\bar{X}$ :
1. The mean of the sampling distribution is equal to the population mean ( $\mu_{\bar{X}} = \mu$ ).
2. The standard deviation (Standard Error) of the sampling distribution is lower than the population standard deviation: $\sigma_{\bar{X}} = \frac{\sigma}{\sqrt{n}}$ .
3. Averages are consistently less variable than individual observations.

The Central Limit Theorem (CLT)

The Theorem: When taking a Simple Random Sample (SRS) of size $n$ from any population with mean $\mu$ and standard deviation $\sigma$ , the sampling distribution of $\bar{X}$ is approximately Normal if $n$ is sufficiently large.
Notation: $\bar{X} \approx N\left(\mu, \frac{\sigma}{\sqrt{n}}\right)$ .
Significance: The original population distribution does not need to be symmetric or normal. As $n$ increases, the skewness of the original distribution is overcome.
Sample Size Guidelines:
- For symmetric distributions, $\bar{X}$ becomes normal at very low $n$ .
- For strongly skewed distributions, a higher $n$ is required.
- Course Rule: It is safe to apply the CLT when $n \ge 30$ .

Practical Examples and R Code

Male Heights Case Study: Population $X \sim N(178, 6)$ .
- Individual Probability: P(X > 180) = P\left(Z > \frac{180 - 178}{6}\right) = P(Z > 0.33) = 0.3707.
- Sample Probability ( $n=10$ ): P(\bar{X} > 180) = P\left(Z > \frac{180 - 178}{\frac{6}{\sqrt{10}}}\right) = P(Z > 1.05) \approx 0.1469.
- R Code for Sample Mean: pnorm(180, 178, 6/sqrt(10), lower.tail = FALSE) yields 0.1459203.
Light Bulb Lifetimes Case Study:
- Population is right-skewed with $\mu = 400\,hr$ and $\sigma = 250\,hr$ .
- For $n = 40$ bulbs, calculate the probability the mean lifetime exceeds $450\,hr$ .
- Since $n \ge 30$ , use CLT: P(\bar{X} > 450) \approx P\left(Z > \frac{450 - 400}{\frac{250}{\sqrt{40}}}\right) = P(Z > 1.26) = 0.1038.
Metal Bolts Case Study ( $\mu = 1.25\,cm$ , $\sigma = 0.05\,cm$ ):
- Probability that an SRS of $n = 100$ has a mean diameter between $1.24\,cm$ and $1.26\,cm$ .
- P(1.24 < \bar{X} < 1.26) \approx P\left(\frac{1.24 - 1.25}{\frac{0.05}{\sqrt{100}}} < Z < \frac{1.26 - 1.25}{\frac{0.05}{\sqrt{100}}}\right) = P(-2.00 < Z < 2.00) = 0.9544.
- Constraint: If we only select $n = 5$ bolts, and the underlying distribution of $X$ is unknown, we cannot calculate the probability because the sample size is too small for the CLT.

Summary Classification for $\bar{X}$

Scenario 1: Population is Normal:
- $X \sim N(\mu, \sigma)$ .
- Result: $\bar{X}$ is exactly Normal for any sample size $n$ .
Scenario 2: Population is Not Normal / Unknown:
- If $n \ge 30$ : $\bar{X}$ is approximately Normal by the CLT.
- If n < 30: $\bar{X}$ is not normal; standard probability techniques cannot be applied.
Universal Truth: For any distribution, the mean of the sample mean is $\mu$ and the standard deviation is $\frac{\sigma}{\sqrt{n}}$ .