Hypothesis Testing
Why Hypothesis Testing?
Confidence intervals use data to estimate a range likely to capture an unknown parameter.
Hypothesis testing uses data to draw a conclusion on how plausible a hypothesis is about an unknown parameter.
Statistical Inference: using data analysis to infer properties of a population.
Hypothesis testing is used to test claims about a population.
Random samples lead to sampling variation.
Hypothesis testing is a statistical procedure using data from random samples to test a claim and quantify its likelihood.
Learning Objectives
Define null and alternative hypotheses mathematically.
Obtain the null distribution and explain its representation.
Explain the significance level and its relation to hypothesis testing.
Apply the appropriate formula for testing different parameters and scenarios.
Choose the appropriate distribution (Z, t, or neither) for testing the population mean.
Calculate and make decisions using π-value, rejection region, and confidence interval.
Discuss how significance level affects π-value, rejection region, and confidence interval.
Explain Type I and Type II errors in simple terms.
Calculate the probability of Type I and Type II errors.
Discuss the effect of reducing the probability of one error on the other.
Discuss the effect of sample size on the probability of Type I and Type II errors.
Hypothesis Testing: Example
Determine if a brain supplement improves CE 93 test grades based on a trial.
Historically, the average grade for CE 93 tests is 75%.
A trial with 12 students taking the supplement resulted in an average grade of 81%.
Criminal Justice System Analogy
Hypothesis testing is analogous to the legal system.
Null Hypothesis (π»0):
Initial assumption (safe, minimal risk).
Conclusion if evidence is ambiguous.
Alternative Hypothesis (π»1):
The hypothesis being tested.
Conclusion only if there is conclusive evidence.
Null and Alternative Hypothesis: Justice System
Null Hypothesis : Presumed innocent until proven guilty.
Alternative Hypothesis : What you are trying to prove.
Decisions:
Declare the defendant not guilty.
Declare the defendant guilty.
Hypothesis Testing Procedure
Hypotheses
Test Statistic
Null Distribution
Decision Criteria
Step 1: Hypotheses
Define null () and alternative () hypotheses.
Hypotheses are statements about an unknown population parameter .
Null Hypothesis :
Always involves an equal sign (=, β€, β₯).
Fail to reject unless strong evidence against it.
Alternative Hypothesis :
Does not involve an equal sign (β , <, >).
Reject the null only if there is strong evidence supporting .
Step 1: Examples
; H1: G > 75% (G = average grade with supplement)
H0: ΞΌ = $2000; H1: ΞΌ < $2000 (ΞΌ = average rent in Austin)
; (p = proportion of tails)
Step 2: Test Statistic
Define an appropriate test statistic to decide whether to reject the null hypothesis.
Step 2: Examples
Sample mean of 12 students (grades follow normal distribution, unknown).
Sample mean of 42 rents in Austin (rent distribution, unknown).
Proportion of tails from 100 coin flips.
Step 3: Null Distribution
Determine the distribution of the test statistic under the null hypothesis.
Step 3: Example
Average grade after supplement: , H1: G > 75%.
Grades follow a normal distribution with .
Null Distribution of the test statistics.
Step 4: Decision Criteria
Define conditions for accepting or rejecting the null hypothesis.
To reject , evidence should have a very low probability of occurring if were true.
Significance Level
To reject , the evidence needs a very low probability if were true.
Threshold probability (Ξ±) below which we reject (e.g., 5%).
If and sample evidence has a probability of 3% if were true, reject .
Method 1: Confidence Intervals
Calculate confidence intervals based on the test statistic.
If significance level is , associated confidence interval is .
If the null hypothesized value is within the confidence interval, fail to reject . Otherwise, reject .
Confidence Intervals: Types of Tests
Two-sided test: ,
One-sided test: , H1: π > π
One-sided test: , H1: π < π
Confidence Intervals: Population Mean
If testing for the population mean using a large sample with an unknown , use appropriate confidence interval formulas based on whether the test is one-sided or two-sided.
Confidence Interval Example
Testing average grade after supplement with .
Historically, the average grade is 75% with . Sample of 12 students averaged 81%.
Method 2: Define Rejection Regions
Define rejection regions for the test statistic.
If the observed test statistic falls within the rejection region, reject .
Rejection Region vs. Confidence Interval
Confidence interval is centered about the observed sample mean , whereas the rejection region is based on the null distribution centered about .
Rejection Region: Types of Tests
Two-sided test: ,
One-sided test: , H1: π > π
One-sided test: , H1: π < π
Rejection Region: Critical z-scores
If testing for the population mean using the z-distribution, determine the critical z-scores for the rejection regions based on whether the test is one-sided or two-sided.
Rejection Region Example
Testing average grade after supplement with .
Historically, the average grade is 75% with . A sample of 12 students averaged 81%.
Rejection Region Summary
Rejection region is selected such that its probability under the null distribution equals .
If the observed test statistic falls within the rejection region, reject .
Method 3: Calculate p-value
Compute the π-value of the test statistic.
The probability of obtaining a test statistic at least as extreme as the result actually observed, assuming to be true (i.e., under the null distribution).
If the π-value , reject .
p-value: Types of Tests
Two-sided test: ,
One-sided test: , H1: π > π
One-sided test: , H1: π < π
p-value for Population Mean
If testing for the population mean using the z-distribution, determine the p-value based on whether the test is one-sided or two-sided.
p-value Example
Testing average grade after supplement.
Historically, the average grade is 75% with . Sample of 12 students averaged 81%.
p-value Summary
The smaller the π-value, the stronger the evidence is against .
If π-value β€ πΌ: result is statistically significant at the level β Reject
Otherwise, if π-value > πΌ: result is not statistically significant at the level β Fail to reject
Decision Criteria Summary
Method | πΆ = π. ππ | πΆ = π. ππ |
|---|---|---|
π-value | Reject if π-value < πΌ | Fail to reject if π-value > πΌ |
Rejection region | Reject if test statistic within region | Fail to reject if test statistic not in region |
Confidence interval | Reject if value not within CI | Fail to reject if value within CI |
Population Mean
Large-Sample Test
To test the hypothesis with a large sample (n > 30):
Test statistic:
Null distribution: Sample Mean ~ Normal(, )
Small-Sample Test
Use when the sample is small and the population standard deviation (Ο) is unknown.
Test statistic:
Null distribution: t-distribution with (n-1) degrees of freedom.
Hypothesis Testing Errors
There are two possibilities (truth):
The supplement does not improve the average grade
The supplement improves the average grade
Based on your trial, you will make one of two decisions:
Donβt develop the supplement because you donβt believe it improves grades
Bring the supplement to the market because you believe it improves grades
However, there is a chance that the decision could be the wrong one
With 2 possibilities (truth) and 2 decisions, we have 4 possible outcomes
Hypothesis Testing Errors
Ineffective | Effective | |
|---|---|---|
Don't Sell | β | β |
Sell | β | β |
Type I and Type II errors
Decision | Truth | Truth |
|---|---|---|
Fail to Reject | Correct +ve | Type II Error π½ |
Reject | Type I Error πΌ | Correct +ve |
Type I Error
Rejecting the null hypothesis when it is true (False Positive).
Ξ± is the probability of making a Type I error.
Type II Error
Failing to reject the null hypothesis when it is false (False Negative).
π½ is the probability of making a Type II error.
Reducing Error
Decreasing the probability of Type I error (πΌ) increases the probability of Type II error (π½), and vice versa.
The only way to decrease both simultaneously is to increase the sample size.