Inferring Population Means
Chapter 9: Inferring Population Means
The primary objectives for this chapter include understanding the t-model, constructing and interpreting confidence intervals for the mean, and performing hypothesis tests for the mean.
The Central Limit Theorem for Sample Means
Definition: If certain conditions are met, the Central Limit Theorem (CLT) assures us that the distribution of sample means follows an approximately Normal distribution no matter what the shape of the population distribution.
Conditions for CLT: When determining whether the CLT can be applied to analyze data, three essential conditions must be checked: 1. Random Sample and Independence: Each observation must be collected randomly from the population, and the observations must be independent of one another. 2. Large Sample: One of two scenarios must be true: either the population distribution itself is Normal, or the sample size is large (typically is considered sufficient). 3. Big Population: If the sample is collected without replacement, the population must be at least times larger than the sample size ().
Properties of the Sampling Distribution: If the three conditions are met, a random sample drawn from a population with mean and standard deviation results in a sampling distribution with: - Mean: - Standard Deviation (): - Shape: Approximately Normal. The larger the sample size, the closer the distribution becomes to a Normal distribution. - Note on Population Normalcy: If the population is Normal to begin with, then the sampling distribution is exactly a Normal distribution, regardless of the sample size.
Named Example: Weight of Angus Cows
Context: The weight of Angus cows is distributed with a population mean and a population standard deviation .
CLT Application: For a random sample of Angus cows: - The sample means will average . - The standard deviation of the sample means () is calculated as:
- The CLT states the sampling distribution will be approximately Normal: .Distribution Estimates: For the means of all possible random samples: - will fall between and . - will fall between and . - will fall between and .
The Student’s t-Distribution
The Challenge: While the CLT is powerful, in practice, we almost never know the population standard deviation ().
Transition from s to t: Using the sample standard deviation () to estimate works for the standard error (), but applying this to a Normal model introduces error.
Origins: William Gosset developed new models, one for each sample size (), which provide better accuracy when is unknown. These are known as the Student’s t-distributions.
Characteristics of the t-Distribution: - It is symmetric and bell-shaped. - It has "thicker tails" than the Normal distribution. - Its specific shape depends on the degrees of freedom (df). - If is small, the tails are thick; as increases, the tails become thinner and the distribution approaches the Normal distribution.
Degrees of Freedom: For every sample size , there is a different t-distribution. The degrees of freedom are calculated as: - - This represents the number of independent quantities left after the parameters have been estimated (e.g., the of the mean is ).
Answering Questions about Population Means
There are two primary approaches for answering questions about a population mean: 1. Confidence Intervals: Used for estimating the value of a parameter. 2. Hypothesis Tests: Used for deciding whether a parameter’s value matches a specific claim.
These methods are modifications of those used for population proportions, adapted for population means.
Confidence Intervals for a Population Mean
Conditions Check: 1. Random, independent sample. 2. Large sample ( or the population is Normally distributed). 3. Big population (If sampling without replacement, population must be at least sample size).
The Standardized Sample Mean: When conditions are met, the standardized sample mean follows the t-model with degrees of freedom:
Standard Error (SE): We estimate the standard deviation of the sampling distribution using:
One-Sample t-Interval Formula: - The critical value depends on the desired confidence level and the degrees of freedom (). - Models with few degrees of freedom have a larger standard deviation than the Normal model, resulting in wider confidence intervals.
Critical Values and Table Usage
Finding t*: Critical values are found using a t-Table in the row for degrees of freedom and the column for the desired confidence level.
Example: For a sample size of at a confidence level, . Looking at the table, the critical value is .
Missing Degrees of Freedom: If the specific number of degrees of freedom is not listed in the table, use the next smaller number available in the table.
Examples for Critical Value find: - Finding the critical value of for a confidence interval with : (based on provided table segments). - Finding the critical value of for a confidence interval with : Since is not in the listed table, one would use the next smaller listed value, such as . Based on the table provided, the critical value for at is .
Summary Comparison: z vs. t
t-Distribution Characteristics: - Unimodal and symmetric about its mean. - Long tails compared to the Normal distribution. - Converges to the Normal model for large sample sizes ().
When to Use: - If is known, use the Normal model (). - If is unknown and estimated using , use the t-model.
Named Example: College Student Sleep
Objective: Build a Confidence Interval for the mean amount of sleep college students get per night based on a random sample of students.
Given Data: , , .
Condition Check: Sample is random and independent; population is large; is sufficient.
Parameters: - - - Critical value for confidence is .
Calculation: - Margin of Error () = - Interval: - CI:
Conclusion: We are confident that the true population mean number of hours college students sleep is between and hours.
Technical Note on Interpretation: It is correct to say " of all possible samples will produce intervals that actually do contain the true mean sleep," but the "I am confident" phrasing is more personal and less technical for general readers.
Named Example: Highway Speeds
Scenario: A random sample of cars has a mean speed of with a standard deviation of .
Task: Find the confidence interval.
Condition Verification: Random/independent sample; population is large ( cars); sample size is at least .
Calculation (using technology): - Mean: - : - : - : - : - Confidence Interval:
Interpretation: We are confident the mean speed of all cars is between and .
Plausibility Check: Is it plausible the mean speed is ? No, because is not contained within our confidence interval.
Named Example: Movie Watching Habits
Scenario: Random sample of students; movies, .
Task: Construct a confidence interval.
Results: - : - : - : .
Comparative Logic: A confidence interval for the same data would be wider than the interval because the higher confidence requirement requires a larger multiplier.
Hypothesis Testing for the Mean
Four-Step Process: 1. Hypothesize: State the null () and alternative () hypotheses about the population parameter. 2. Prepare: Choose a significance level (), select the test statistic, and check conditions/assumptions. 3. Compute to Compare: Calculate the test statistic and the resulting p-value. 4. Interpret: Decide whether to reject the null hypothesis and state the conclusion in context.
Test Statistic for One-Sample t-Test: - where - If conditions hold, this follows a t-distribution with .
One- and Two-sided Alternative Hypotheses: The choice of (either directed e.g., > \mu_0 or undirected e.g., ) determines how the p-value is calculated (one tail vs. two tails).
Named Example: Nursing Staff Experience
Situation: In , mean experience was . A survey of nurses shows and . Have years of experience increased?
Significance Level:
Hypotheses: - - H_a: \mu > 14.3
Calculation: - - -
Result: p-value = .
Conclusion: Since the p-value () is less than the significance level (), we reject . Evidence suggests the mean experience among nursing staff has indeed increased.
Named Example: Hockey Attendance
Situation: average attendance was . A sample of games in shows and . Has attendance changed?
Hypotheses: - -
Calculation: - - - p\text{-value} < 0.0001
Conclusion: With a p-value less than the significance level, we reject . Mean attendance has changed since .
Named Example: Weight Loss Study
Scenario: subjects on a low-fat diet for . loss, . Is mean weight loss greater than ?
Hypotheses: - - H_a: \mu > 0
Calculation: - - .
Finding P-value via Table: - is not in the table; use (next smallest). - Look across the row for values bracketing . These are (upper-tail prob ) and (upper-tail prob ). - The p-value is therefore between and .
Final Result: Accurate p-value using technology is . Reject at . Mean weight loss is significantly greater than .