Hypothesis Testing

Why Hypothesis Testing?

Confidence intervals use data to estimate a range likely to capture an unknown parameter.
Hypothesis testing uses data to draw a conclusion on how plausible a hypothesis is about an unknown parameter.
Statistical Inference: using data analysis to infer properties of a population.
Hypothesis testing is used to test claims about a population.
Random samples lead to sampling variation.
Hypothesis testing is a statistical procedure using data from random samples to test a claim and quantify its likelihood.

Learning Objectives

Define null and alternative hypotheses mathematically.
Obtain the null distribution and explain its representation.
Explain the significance level and its relation to hypothesis testing.
Apply the appropriate formula for testing different parameters and scenarios.
Choose the appropriate distribution (Z, t, or neither) for testing the population mean.
Calculate and make decisions using 𝑝-value, rejection region, and confidence interval.
Discuss how significance level affects 𝑝-value, rejection region, and confidence interval.
Explain Type I and Type II errors in simple terms.
Calculate the probability of Type I and Type II errors.
Discuss the effect of reducing the probability of one error on the other.
Discuss the effect of sample size on the probability of Type I and Type II errors.

Hypothesis Testing: Example

Determine if a brain supplement improves CE 93 test grades based on a trial.
Historically, the average grade for CE 93 tests is 75%.
A trial with 12 students taking the supplement resulted in an average grade of 81%.

Criminal Justice System Analogy

Hypothesis testing is analogous to the legal system.
Null Hypothesis (𝐻0):
- Initial assumption (safe, minimal risk).
- Conclusion if evidence is ambiguous.
Alternative Hypothesis (𝐻1):
- The hypothesis being tested.
- Conclusion only if there is conclusive evidence.

Null and Alternative Hypothesis: Justice System

Null Hypothesis $𝐻0$ : Presumed innocent until proven guilty.
Alternative Hypothesis $𝐻1$ : What you are trying to prove.
Decisions:
1. Declare the defendant not guilty.
2. Declare the defendant guilty.

Hypothesis Testing Procedure

Hypotheses
Test Statistic
Null Distribution
Decision Criteria

Step 1: Hypotheses

Define null ( $H0$ ) and alternative ( $H1$ ) hypotheses.
Hypotheses are statements about an unknown population parameter $𝜃$ .
Null Hypothesis $H_0$ :
- Always involves an equal sign (=, ≤, ≥).
- Fail to reject unless strong evidence against it.
Alternative Hypothesis $H_1$ :
- Does not involve an equal sign (≠, <, >).
- Reject the null only if there is strong evidence supporting $H_1$ .

Step 1: Examples

$H0: G = 75%$ ; H1: G > 75% (G = average grade with supplement)
H0: μ = $2000; H1: μ < $2000 (μ = average rent in Austin)
$H0: p = 0.5$ ; $H1: p ≠ 0.5$ (p = proportion of tails)

Step 2: Test Statistic

Define an appropriate test statistic to decide whether to reject the null hypothesis.

Step 2: Examples

Sample mean of 12 students (grades follow normal distribution, $σ$ unknown).
Sample mean of 42 rents in Austin (rent distribution, $σ$ unknown).
Proportion of tails from 100 coin flips.

Step 3: Null Distribution

Determine the distribution of the test statistic under the null hypothesis.

Step 3: Example

Average grade after supplement: $H0: G = 75%$ , H1: G > 75%.
Grades follow a normal distribution with $σ^2 = 100$ .
Null Distribution of the test statistics.

Step 4: Decision Criteria

Define conditions for accepting or rejecting the null hypothesis.
To reject $H0$ , evidence should have a very low probability of occurring if $H0$ were true.

Significance Level

To reject $H0$ , the evidence needs a very low probability if $H0$ were true.
Threshold probability (α) below which we reject $H_0$ (e.g., 5%).
If $α = 5%$ and sample evidence has a probability of 3% if $H0$ were true, reject $H0$ .

Method 1: Confidence Intervals

Calculate confidence intervals based on the test statistic.
If significance level is $α$ , associated confidence interval is $100(1 − α)%%$ .
If the null hypothesized value $H0: 𝜃 = 𝑎$ is within the confidence interval, fail to reject $H0$ . Otherwise, reject $H_0$ .

Confidence Intervals: Types of Tests

Two-sided test: $H0: 𝜃 = 𝑎$ , $H1: 𝜃 ≠ 𝑎$
One-sided test: $H0: 𝜃 = 𝑎$ , H1: 𝜃 > 𝑎
One-sided test: $H0: 𝜃 = 𝑎$ , H1: 𝜃 < 𝑎

Confidence Intervals: Population Mean

If testing for the population mean using a large sample with an unknown $σ$ , use appropriate confidence interval formulas based on whether the test is one-sided or two-sided.

Confidence Interval Example

Testing average grade after supplement with $α = 0.05$ .
Historically, the average grade is 75% with $σ^2 = 100$ . Sample of 12 students averaged 81%.

Method 2: Define Rejection Regions

Define rejection regions for the test statistic.
If the observed test statistic falls within the rejection region, reject $H_0$ .

Rejection Region vs. Confidence Interval

Confidence interval is centered about the observed sample mean $x$ , whereas the rejection region is based on the null distribution centered about $H_0$ .

Rejection Region: Types of Tests

Two-sided test: $H0: 𝜃 = 𝑎$ , $H1: 𝜃 ≠ 𝑎$
One-sided test: $H0: 𝜃 = 𝑎$ , H1: 𝜃 > 𝑎
One-sided test: $H0: 𝜃 = 𝑎$ , H1: 𝜃 < 𝑎

Rejection Region: Critical z-scores

If testing for the population mean $μ$ using the z-distribution, determine the critical z-scores for the rejection regions based on whether the test is one-sided or two-sided.

Rejection Region Example

Testing average grade after supplement with $α = 0.05$ .
Historically, the average grade is 75% with $σ^2 = 100$ . A sample of 12 students averaged 81%.

Rejection Region Summary

Rejection region is selected such that its probability under the null distribution equals $α$ .
If the observed test statistic falls within the rejection region, reject $H_0$ .

Method 3: Calculate p-value

Compute the 𝑝-value of the test statistic.
- The probability of obtaining a test statistic at least as extreme as the result actually observed, assuming $H_0$ to be true (i.e., under the null distribution).
If the 𝑝-value $≤ α$ , reject $H_0$ .

p-value: Types of Tests

Two-sided test: $H0: 𝜃 = 𝑎$ , $H1: 𝜃 ≠ 𝑎$
One-sided test: $H0: 𝜃 = 𝑎$ , H1: 𝜃 > 𝑎
One-sided test: $H0: 𝜃 = 𝑎$ , H1: 𝜃 < 𝑎

p-value for Population Mean

If testing for the population mean $μ$ using the z-distribution, determine the p-value based on whether the test is one-sided or two-sided.

p-value Example

Testing average grade after supplement.
Historically, the average grade is 75% with $σ^2 = 100$ . Sample of 12 students averaged 81%.

p-value Summary

The smaller the 𝑝-value, the stronger the evidence is against $H_0$ .
If 𝑝-value ≤ 𝛼: result is statistically significant at the $100𝛼%%$ level → Reject $H_0$
Otherwise, if 𝑝-value > 𝛼: result is not statistically significant at the $100𝛼%%$ level → Fail to reject $H_0$

Decision Criteria Summary

Method	𝜶 = 𝟎. 𝟎𝟓	𝜶 = 𝟎. 𝟎𝟏
𝑝-value	Reject $H_0$ if 𝑝-value < 𝛼	Fail to reject if 𝑝-value > 𝛼
Rejection region	Reject $H_0$ if test statistic within region	Fail to reject if test statistic not in region
Confidence interval	Reject $H_0$ if value not within CI	Fail to reject if value within CI

Population Mean

Large-Sample Test

To test the hypothesis $H0: μ = μ0$ with a large sample (n > 30):
1. Test statistic:
 $z = "(x - μ_0) / (σ / √n)"$
2. Null distribution: Sample Mean ~ Normal( $μ_0$ , $σ^2/n$ )

Small-Sample Test

Use when the sample is small and the population standard deviation (σ) is unknown.
1. Test statistic:
  $t = "(x - μ_0) / (s / √n)"$
2. Null distribution: t-distribution with (n-1) degrees of freedom.

Hypothesis Testing Errors

There are two possibilities (truth):
- The supplement does not improve the average grade
- The supplement improves the average grade
Based on your trial, you will make one of two decisions:
- Don’t develop the supplement because you don’t believe it improves grades
- Bring the supplement to the market because you believe it improves grades
However, there is a chance that the decision could be the wrong one
- With 2 possibilities (truth) and 2 decisions, we have 4 possible outcomes

Hypothesis Testing Errors

	Ineffective	Effective
Don't Sell	✓	✗
Sell	✗	✓

Type I and Type II errors

Decision	Truth $H_0$	Truth $H_1$
Fail to Reject $H_0$	Correct +ve	Type II Error 𝛽
Reject $H_0$	Type I Error 𝛼	Correct +ve

Type I Error

Rejecting the null hypothesis when it is true (False Positive).
α is the probability of making a Type I error.

Type II Error

Failing to reject the null hypothesis when it is false (False Negative).
𝛽 is the probability of making a Type II error.

Reducing Error

Decreasing the probability of Type I error (𝛼) increases the probability of Type II error (𝛽), and vice versa.
The only way to decrease both simultaneously is to increase the sample size.