STAT 1000 - Probability & Sampling Distribution of the Sample Mean

STAT 1000 - Basic Statistics Analysis I

Unit 05 - Probability & Sampling Distribution of the Sample Mean

Outline
  • Randomness, definition of probability

  • Sample space

  • Basic probability rules

  • Probability distributions

  • Sampling distribution of a sample mean

  • Central Limit Theorem


Variability and Randomness
  • Statistics engages with the concept of variability.

  • Key Idea:

    • Random behavior is unpredictable in the short run.

    • However, it exhibits a predictable distribution in the long run.


Probability
  • Conceptual Scenario:

    • When rolling a die, tossing a coin, or buying a lottery ticket with chosen numbers, individual outcomes are uncertain.

    • Yet these outcomes can be described by a regular pattern that emerges over numerous trials, laying the foundation for probability studies.


Probability and Fair Coin
  • When tossing a fair coin:

    • The likelihood of a Head (H) and a Tail (T) is equal.

    • Probability (P) of either outcome:
      P(H) = P(T) = 0.5

    • Example Sequence: H T T H T H

    • Observed proportions: 1.0, 0.5, 0.33, 0.5, 0.4, 0.5

    • Early fluctuations observed in proportions settle closer to 0.5 in the long run.

    • Example Prediction: Tossing a fair coin 10,000 times, observation will likely fall between 4,900 and 5,100 Heads.


Randomness Defined
  • A phenomenon is termed random if individual outcomes are uncertain, yet there is a consistent distribution of outcomes across many repetitions.

  • Definition of a Random Experiment:

    • Any process with uncertainty that has two or more possible outcomes.


Difference Between Proportions and Probability
  • Proportion: A known or observed value.

  • Probability: A theoretical value related to the proportion over an infinitely long series of trials.

  • Usage in Language:

    • Proportions are present tense; probabilities relate to potential future events.


Probability Theory
  • Defined as the mathematical branch that describes random behavior.

  • Note: Probability is not directly observable; mathematical models are essential for analysis.


Probability Model
  • Definition of a Probability Model:

    • A framework involving two components:

    • A list of possible outcomes

    • A probability for each outcome


Sample Space
  • Sample Space (S): The set of all possible outcomes for a random phenomenon.

    • Example 1: Coin toss, S = {H, T}.

    • Example 2: Lotto 6/49 (population 49), S has nearly 14 million combinations.

    • Example 3: Coin tossed three times, S = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}.

    • Example 4: Quality control with three fuses: S = {NNN, DNN, NDN, NND, DDN, DND, NDD, DDD}.

    • Number of outcomes calculated as: 2^n, where n is the number of coins or fuses examined.


Probabilities of Outcomes
  • For outcomes O1, O2, …, On within sample space S:

    • Denote probabilities as pi for outcome Oi.

    • Conditions for probabilities:

    • 0 \leq p_i \leq 1 for each i = 1, 2, …, n.

    • p1 + p2 + … + p_n = 1


Events
  • Definition: An event is any subset of outcomes from the sample space.

    • Example: Flipping a coin three times for exactly two Tails, A = {HTT, THT, TTH}.

    • Example: Rolling two dice and summing to 9, B = {36, 45, 54, 63}.


Complements of Events
  • Complement of event A (denoted A^c): Set of all outcomes not contained in A.

    • Example 1: Flipping a coin, H^c = T.

    • Example 2: Flipping three coins where A = {exactly two tails}, A^c = {HHH, HHT, HTH, THH, TTT}.

  • Relation:

    • The sum of probabilities follows: P(A^c) = 1 - P(A)

    • Example Usage: P(Bombers win) = 1 - P(Bombers lose)


Random Variables
  • Definition: A random variable (RV) is a numerical outcome of a random phenomenon.

  • Types of Random Variables:

    • Discrete: Countable outcomes (e.g., number of children in a family).

    • Continuous: Uncountable outcomes within an interval (e.g., weight, distance).


Probability Distributions
  • A probability distribution maps variable values to probabilities.

  • Example Distribution: Rolling a fair six-sided die, let X be the face showing.

  • Case Study: Dice painted in multiple colors with given probabilities for rolling a color.


Probability of Events
  • Method for calculating the event probability involves summing probabilities of contained outcomes.

  • Example: Flipping three coins for two tails, A = {HTT, THT, TTH}:

    • Calculation:
      P(A) = P(HTT)+P(THT)+P(TTH)


Probability Distributions of Sums
  • Example: When rolling two dice, the outcome X as the sum:

    • To find P(X > 8), the expression is:
      P(X > 8) = P(X = 9) + P(X = 10) + P(X = 11) + P(X = 12)
      = \frac{4}{36} + \frac{3}{36} + \frac{2}{36} + \frac{1}{36} = \frac{10}{36} = 0.2778


NHL Example
  • Atlantic Division teams each have calculated probabilities for winning.

    • The sum of calculated probabilities must equal 1.

    • To find the probability of a Canadian team winning:
      P(Canadian team wins) = P(Montreal wins) + P(Ottawa wins) + P(Toronto wins) = \text{k} + 0.02 + 4\text{k}

    • Solving for k yields:

    • Hence,
      P(Canadian team wins) = 0.37


American Team Winning Probability
  • For the Pacific Division teams, to find the probability for an American team:

    • Calculate as follows:
      P(American team wins) = 1 - P(Canadian team wins)


Probabilities for Continuous Variables
  • Dealing with continuous random outcomes requires approaching by intervals of values rather than single points.

  • Assigning probabilities by assessing areas under density curves represents chances in given intervals.


Normal Distribution Example
  • Example scenario: Pulse rates of adult females in normal distribution, mean 74 bpm, std dev 12 bpm.

    • Calculation required for selecting above certain thresholds.


Distribution of the Sample Mean
  • Exploring mean calculations for random samples from populations.

  • Process involves taking several samples of size n, calculating their mean, and plotting results to observe distribution.

    • Example: Heights of adult Canadian males normally distributed as X ~ N(178, 6).


Sampling Distribution of Sample Mean
  • For sample mean calculations:

    • Essential rule: if X is random variable with mean μ and deviation σ, then
      \bar{X} \sim N(\mu, \frac{\sigma}{\sqrt{n}})

  • Conditions:

    • Distribution becomes normal as sample size increases (the Central Limit Theorem).


Central Limit Theorem
  • Fundamental statistical theorem stating:

    • For large sample sizes (n), the sampling distribution of the mean will approximate normal distribution regardless of the population distribution.

  • Implication of the sample size: Greater the sample, faster convergence to normal distribution.

    • Often applied with n ≥ 30 as a threshold for safety in statistical assumptions.


Example - Light Bulb Lifetimes
  • Light bulb lifetime exhibiting a right-skewed distribution, mean 400 hours, std dev 250 hours.

    • Evaluating probabilities from selected samples of light bulbs, even without initial normality in underlying distribution due to applying CLT for large samples.


Summary of Sampling Distribution
  • When X follows a normal distribution,

    • Sampling distribution reflects directly as normal.

  • For non-normal distributions, yields approximately normal as long as n is significantly large.