STAT 1000 - Probability & Sampling Distribution of the Sample Mean

STAT 1000 - Basic Statistics Analysis I

Unit 05 - Probability & Sampling Distribution of the Sample Mean

Outline

Randomness, definition of probability
Sample space
Basic probability rules
Probability distributions
Sampling distribution of a sample mean
Central Limit Theorem

Variability and Randomness

Statistics engages with the concept of variability.
Key Idea:
- Random behavior is unpredictable in the short run.
- However, it exhibits a predictable distribution in the long run.

Probability

Conceptual Scenario:
- When rolling a die, tossing a coin, or buying a lottery ticket with chosen numbers, individual outcomes are uncertain.
- Yet these outcomes can be described by a regular pattern that emerges over numerous trials, laying the foundation for probability studies.

Probability and Fair Coin

When tossing a fair coin:
- The likelihood of a Head (H) and a Tail (T) is equal.
- Probability (P) of either outcome:
  P(H) = P(T) = 0.5
- Example Sequence: H T T H T H
- Observed proportions: 1.0, 0.5, 0.33, 0.5, 0.4, 0.5
- Early fluctuations observed in proportions settle closer to 0.5 in the long run.
- Example Prediction: Tossing a fair coin 10,000 times, observation will likely fall between 4,900 and 5,100 Heads.

Randomness Defined

A phenomenon is termed random if individual outcomes are uncertain, yet there is a consistent distribution of outcomes across many repetitions.
Definition of a Random Experiment:
- Any process with uncertainty that has two or more possible outcomes.

Difference Between Proportions and Probability

Proportion: A known or observed value.
Probability: A theoretical value related to the proportion over an infinitely long series of trials.
Usage in Language:
- Proportions are present tense; probabilities relate to potential future events.

Probability Theory

Defined as the mathematical branch that describes random behavior.
Note: Probability is not directly observable; mathematical models are essential for analysis.

Probability Model

Definition of a Probability Model:
- A framework involving two components:
- A list of possible outcomes
- A probability for each outcome

Sample Space

Sample Space (S): The set of all possible outcomes for a random phenomenon.
- Example 1: Coin toss, S = {H, T}.
- Example 2: Lotto 6/49 (population 49), S has nearly 14 million combinations.
- Example 3: Coin tossed three times, S = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}.
- Example 4: Quality control with three fuses: S = {NNN, DNN, NDN, NND, DDN, DND, NDD, DDD}.
- Number of outcomes calculated as: 2^n, where n is the number of coins or fuses examined.

Probabilities of Outcomes

For outcomes O1, O2, …, On within sample space S:
- Denote probabilities as pi for outcome Oi.
- Conditions for probabilities:
- 0 \leq p_i \leq 1 for each i = 1, 2, …, n.
- p1 + p2 + … + p_n = 1

Events

Definition: An event is any subset of outcomes from the sample space.
- Example: Flipping a coin three times for exactly two Tails, A = {HTT, THT, TTH}.
- Example: Rolling two dice and summing to 9, B = {36, 45, 54, 63}.

Complements of Events

Complement of event A (denoted A^c): Set of all outcomes not contained in A.
- Example 1: Flipping a coin, H^c = T.
- Example 2: Flipping three coins where A = {exactly two tails}, A^c = {HHH, HHT, HTH, THH, TTT}.
Relation:
- The sum of probabilities follows: P(A^c) = 1 - P(A)
- Example Usage: P(Bombers win) = 1 - P(Bombers lose)

Random Variables

Definition: A random variable (RV) is a numerical outcome of a random phenomenon.
Types of Random Variables:
- Discrete: Countable outcomes (e.g., number of children in a family).
- Continuous: Uncountable outcomes within an interval (e.g., weight, distance).

Probability Distributions

A probability distribution maps variable values to probabilities.
Example Distribution: Rolling a fair six-sided die, let X be the face showing.
Case Study: Dice painted in multiple colors with given probabilities for rolling a color.

Probability of Events

Method for calculating the event probability involves summing probabilities of contained outcomes.
Example: Flipping three coins for two tails, A = {HTT, THT, TTH}:
- Calculation:
  P(A) = P(HTT)+P(THT)+P(TTH)

Probability Distributions of Sums

Example: When rolling two dice, the outcome X as the sum:
- To find P(X > 8), the expression is:
  P(X > 8) = P(X = 9) + P(X = 10) + P(X = 11) + P(X = 12)
  = \frac{4}{36} + \frac{3}{36} + \frac{2}{36} + \frac{1}{36} = \frac{10}{36} = 0.2778

NHL Example

Atlantic Division teams each have calculated probabilities for winning.
- The sum of calculated probabilities must equal 1.
- To find the probability of a Canadian team winning:
  P(Canadian team wins) = P(Montreal wins) + P(Ottawa wins) + P(Toronto wins) = \text{k} + 0.02 + 4\text{k}
- Solving for k yields:
- Hence,
  P(Canadian team wins) = 0.37

American Team Winning Probability

For the Pacific Division teams, to find the probability for an American team:
- Calculate as follows:
  P(American team wins) = 1 - P(Canadian team wins)

Probabilities for Continuous Variables

Dealing with continuous random outcomes requires approaching by intervals of values rather than single points.
Assigning probabilities by assessing areas under density curves represents chances in given intervals.

Normal Distribution Example

Example scenario: Pulse rates of adult females in normal distribution, mean 74 bpm, std dev 12 bpm.
- Calculation required for selecting above certain thresholds.

Distribution of the Sample Mean

Exploring mean calculations for random samples from populations.
Process involves taking several samples of size n, calculating their mean, and plotting results to observe distribution.
- Example: Heights of adult Canadian males normally distributed as X ~ N(178, 6).

Sampling Distribution of Sample Mean

For sample mean calculations:
- Essential rule: if X is random variable with mean μ and deviation σ, then
  \bar{X} \sim N(\mu, \frac{\sigma}{\sqrt{n}})
Conditions:
- Distribution becomes normal as sample size increases (the Central Limit Theorem).

Central Limit Theorem

Fundamental statistical theorem stating:
- For large sample sizes (n), the sampling distribution of the mean will approximate normal distribution regardless of the population distribution.
Implication of the sample size: Greater the sample, faster convergence to normal distribution.
- Often applied with n ≥ 30 as a threshold for safety in statistical assumptions.

Example - Light Bulb Lifetimes

Light bulb lifetime exhibiting a right-skewed distribution, mean 400 hours, std dev 250 hours.
- Evaluating probabilities from selected samples of light bulbs, even without initial normality in underlying distribution due to applying CLT for large samples.

Summary of Sampling Distribution

When X follows a normal distribution,
- Sampling distribution reflects directly as normal.
For non-normal distributions, yields approximately normal as long as n is significantly large.