Introduction to Hypothesis Testing: Single Sample T-Test and Error Types
Transition from Previous Topics
We have previously discussed descriptive statistics and inferential statistics, specifically confidence intervals. Now, we are moving to the first hypothesis test: a t-test for a single sample, which is the simplest of many to come. By the end of the week, hypothesis testing (and this specific test) will be connected to confidence intervals, demonstrating they are the same method.
Definition of Hypothesis Testing
Hypothesis testing is a statistical method designed for testing and substantiating claims.
Example: Machine Defect Rate
Claim: A machine's historical defect rate is in .
New Data: A new machine produces a sample with an average defect rate of out of .
Question: Is evidence that the defect rate has decreased, or is it merely random sampling error and not significantly different from ?
Two Complementary Hypotheses
Null Hypothesis ()
Represents the initial value of the population mean based on previous experience or conventional wisdom (the status quo).
Assumed to be true unless strong evidence points to the contrary.
In the defect rate example: The current defect rate is equal to the historical defect rate of .
Court Case Analogy: The person is innocent. This is the operating principle of the legal system, assuming innocence until proven guilty.
Alternative Hypothesis ()
The opposite of the null hypothesis and what we are trying to substantiate.
In the defect rate example: The current defect rate is not equal to the historical defect rate of .
Court Case Analogy: The person is guilty.
Language of Hypothesis Testing
When evaluating evidence, we either reject the null hypothesis (if there's enough evidence against it) or fail to reject the null hypothesis (if there isn't enough evidence to reject it).
Detailed Court Case Analogy
Scenario: Deciding if a person is innocent or guilty based on presented evidence, which is often inconclusive.
Two States of the World (Truth)
The person is truly guilty.
The person is truly innocent.
Two Decision Outcomes (Our Action)
Convict the person.
Acquit the person.
Hypotheses in the Analogy
: The person is innocent (our default assumption).
: The person is guilty (what we seek to substantiate).
Hypothetical Guilt Scale
A rating from (completely innocent) to (completely guilty).
Sampling Distribution of Mean Guilt Ratings
Imagine many juries (each with members) evaluating the exact same case.
Each jury provides an average guilt likelihood.
The distribution of these jury averages is a sampling distribution of the mean guilt ratings.
Case 1: Person is Actually Innocent ( is true):
We would expect most juries to give low average guilt ratings.
The sampling distribution would be biased towards lower scores, though some juries might randomly give higher ratings.
Case 2: Person is Actually Guilty ( is false):
We would expect most juries to give high average guilt ratings.
The sampling distribution would be biased towards higher scores, though some juries might randomly give lower ratings.
The Decision Problem
The sampling distributions for innocent and guilty people will overlap, meaning some innocent people might look guilty, and vice versa. A decision criterion (threshold) is needed to determine conviction or acquittal.
Four Possible Outcomes and Errors in Hypothesis Testing
Table Summary
States of the World | Decision: Retain (Acquit) | Decision: Reject (Convict) |
|---|---|---|
Person is Innocent | Correct Decision: Innocent & Acquit | Type I Error (): Innocent & Convict |
Person is Guilty | Type II Error (): Guilty & Acquit | Correct Decision: Guilty & Convict (Power ) |
Type I Error ()
Occurs when we reject the null hypothesis () when it is actually true.
In court case: Convicting an innocent person.
Consequence: An innocent person is wrongly imprisoned.
Type II Error ()
Occurs when we fail to reject the null hypothesis () when it is actually false.
In court case: Acquitting a guilty person.
Consequence: A guilty person walks free.
Correct Decisions
True Negative: Retaining when is true (Acquitting an innocent person).
True Positive (Power of the Test): Rejecting when is false (Convicting a guilty person). Denoted as . The power of the test is the probability of correctly rejecting a false null hypothesis.
Graphical Representation of Errors and Trade-offs
Two overlapping sampling distributions are plotted on a single graph. One distribution represents the null hypothesis () and the other represents the alternative hypothesis (). The decision criterion (critical value) is a point on the x-axis that separates the rejection region from the non-rejection region.
The area under the distribution that falls into the rejection region represents the Type I Error ().
The area under the distribution that falls into the non-rejection region represents the Type II Error ().
The area under the distribution that falls into the rejection region represents the Power () of the test.