Hypothesis Testing

Why Hypothesis Testing?

  • Confidence intervals use data to estimate a range likely to capture an unknown parameter.

  • Hypothesis testing uses data to draw a conclusion on how plausible a hypothesis is about an unknown parameter.

  • Statistical Inference: using data analysis to infer properties of a population.

  • Hypothesis testing is used to test claims about a population.

  • Random samples lead to sampling variation.

  • Hypothesis testing is a statistical procedure using data from random samples to test a claim and quantify its likelihood.

Learning Objectives

  1. Define null and alternative hypotheses mathematically.

  2. Obtain the null distribution and explain its representation.

  3. Explain the significance level and its relation to hypothesis testing.

  4. Apply the appropriate formula for testing different parameters and scenarios.

  5. Choose the appropriate distribution (Z, t, or neither) for testing the population mean.

  6. Calculate and make decisions using 𝑝-value, rejection region, and confidence interval.

  7. Discuss how significance level affects 𝑝-value, rejection region, and confidence interval.

  8. Explain Type I and Type II errors in simple terms.

  9. Calculate the probability of Type I and Type II errors.

  10. Discuss the effect of reducing the probability of one error on the other.

  11. Discuss the effect of sample size on the probability of Type I and Type II errors.

Hypothesis Testing: Example

  • Determine if a brain supplement improves CE 93 test grades based on a trial.

  • Historically, the average grade for CE 93 tests is 75%.

  • A trial with 12 students taking the supplement resulted in an average grade of 81%.

Criminal Justice System Analogy

  • Hypothesis testing is analogous to the legal system.

  • Null Hypothesis (𝐻0):

    • Initial assumption (safe, minimal risk).

    • Conclusion if evidence is ambiguous.

  • Alternative Hypothesis (𝐻1):

    • The hypothesis being tested.

    • Conclusion only if there is conclusive evidence.

Null and Alternative Hypothesis: Justice System

  • Null Hypothesis 𝐻0𝐻0: Presumed innocent until proven guilty.

  • Alternative Hypothesis 𝐻1𝐻1: What you are trying to prove.

  • Decisions:

    1. Declare the defendant not guilty.

    2. Declare the defendant guilty.

Hypothesis Testing Procedure

  1. Hypotheses

  2. Test Statistic

  3. Null Distribution

  4. Decision Criteria

Step 1: Hypotheses

  • Define null (H<em>0H<em>0) and alternative (H</em>1H</em>1) hypotheses.

  • Hypotheses are statements about an unknown population parameter πœƒπœƒ.

  • Null Hypothesis H0H_0:

    • Always involves an equal sign (=, ≀, β‰₯).

    • Fail to reject unless strong evidence against it.

  • Alternative Hypothesis H1H_1:

    • Does not involve an equal sign (β‰ , <, >).

    • Reject the null only if there is strong evidence supporting H1H_1.

Step 1: Examples

  1. H<em>0:G=75H<em>0: G = 75%; H1: G > 75% (G = average grade with supplement)

  2. H0: ΞΌ = $2000; H1: ΞΌ < $2000 (ΞΌ = average rent in Austin)

  3. H<em>0:p=0.5H<em>0: p = 0.5; H</em>1:p≠0.5H</em>1: p ≠ 0.5 (p = proportion of tails)

Step 2: Test Statistic

  • Define an appropriate test statistic to decide whether to reject the null hypothesis.

Step 2: Examples

  1. Sample mean of 12 students (grades follow normal distribution, σσ unknown).

  2. Sample mean of 42 rents in Austin (rent distribution, σσ unknown).

  3. Proportion of tails from 100 coin flips.

Step 3: Null Distribution

  • Determine the distribution of the test statistic under the null hypothesis.

Step 3: Example

  • Average grade after supplement: H<em>0:G=75H<em>0: G = 75%, H1: G > 75%.

  • Grades follow a normal distribution with Οƒ2=100Οƒ^2 = 100.

  • Null Distribution of the test statistics.

Step 4: Decision Criteria

  • Define conditions for accepting or rejecting the null hypothesis.

  • To reject H<em>0H<em>0, evidence should have a very low probability of occurring if H</em>0H</em>0 were true.

Significance Level

  • To reject H<em>0H<em>0, the evidence needs a very low probability if H</em>0H</em>0 were true.

  • Threshold probability (Ξ±) below which we reject H0H_0 (e.g., 5%).

  • If Ξ±=5Ξ± = 5% and sample evidence has a probability of 3% if H<em>0H<em>0 were true, reject H</em>0H</em>0.

Method 1: Confidence Intervals

  • Calculate confidence intervals based on the test statistic.

  • If significance level is Ξ±Ξ±, associated confidence interval is 100(1βˆ’Ξ±)100(1 βˆ’ Ξ±)%%.

  • If the null hypothesized value H<em>0:πœƒ=π‘ŽH<em>0: πœƒ = π‘Ž is within the confidence interval, fail to reject H</em>0H</em>0. Otherwise, reject H0H_0.

Confidence Intervals: Types of Tests

  • Two-sided test: H<em>0:πœƒ=π‘ŽH<em>0: πœƒ = π‘Ž, H</em>1:πœƒβ‰ π‘ŽH</em>1: πœƒ β‰  π‘Ž

  • One-sided test: H<em>0:πœƒ=π‘ŽH<em>0: πœƒ = π‘Ž, H1: πœƒ > π‘Ž

  • One-sided test: H<em>0:πœƒ=π‘ŽH<em>0: πœƒ = π‘Ž, H1: πœƒ < π‘Ž

Confidence Intervals: Population Mean

  • If testing for the population mean using a large sample with an unknown σσ, use appropriate confidence interval formulas based on whether the test is one-sided or two-sided.

Confidence Interval Example

  • Testing average grade after supplement with Ξ±=0.05Ξ± = 0.05.

  • Historically, the average grade is 75% with Οƒ2=100Οƒ^2 = 100. Sample of 12 students averaged 81%.

Method 2: Define Rejection Regions

  • Define rejection regions for the test statistic.

  • If the observed test statistic falls within the rejection region, reject H0H_0.

Rejection Region vs. Confidence Interval

  • Confidence interval is centered about the observed sample mean xx, whereas the rejection region is based on the null distribution centered about H0H_0.

Rejection Region: Types of Tests

  • Two-sided test: H<em>0:πœƒ=π‘ŽH<em>0: πœƒ = π‘Ž, H</em>1:πœƒβ‰ π‘ŽH</em>1: πœƒ β‰  π‘Ž

  • One-sided test: H<em>0:πœƒ=π‘ŽH<em>0: πœƒ = π‘Ž, H1: πœƒ > π‘Ž

  • One-sided test: H<em>0:πœƒ=π‘ŽH<em>0: πœƒ = π‘Ž, H1: πœƒ < π‘Ž

Rejection Region: Critical z-scores

  • If testing for the population mean ΞΌΞΌ using the z-distribution, determine the critical z-scores for the rejection regions based on whether the test is one-sided or two-sided.

Rejection Region Example

  • Testing average grade after supplement with Ξ±=0.05Ξ± = 0.05.

  • Historically, the average grade is 75% with Οƒ2=100Οƒ^2 = 100. A sample of 12 students averaged 81%.

Rejection Region Summary

  • Rejection region is selected such that its probability under the null distribution equals Ξ±Ξ±.

  • If the observed test statistic falls within the rejection region, reject H0H_0.

Method 3: Calculate p-value

  • Compute the 𝑝-value of the test statistic.

    • The probability of obtaining a test statistic at least as extreme as the result actually observed, assuming H0H_0 to be true (i.e., under the null distribution).

  • If the 𝑝-value ≀α≀ Ξ±, reject H0H_0.

p-value: Types of Tests

  • Two-sided test: H<em>0:πœƒ=π‘ŽH<em>0: πœƒ = π‘Ž, H</em>1:πœƒβ‰ π‘ŽH</em>1: πœƒ β‰  π‘Ž

  • One-sided test: H<em>0:πœƒ=π‘ŽH<em>0: πœƒ = π‘Ž, H1: πœƒ > π‘Ž

  • One-sided test: H<em>0:πœƒ=π‘ŽH<em>0: πœƒ = π‘Ž, H1: πœƒ < π‘Ž

p-value for Population Mean

  • If testing for the population mean ΞΌΞΌ using the z-distribution, determine the p-value based on whether the test is one-sided or two-sided.

p-value Example

  • Testing average grade after supplement.

  • Historically, the average grade is 75% with Οƒ2=100Οƒ^2 = 100. Sample of 12 students averaged 81%.

p-value Summary

  • The smaller the 𝑝-value, the stronger the evidence is against H0H_0.

  • If 𝑝-value ≀ 𝛼: result is statistically significant at the 100𝛼100𝛼%% level β†’ Reject H0H_0

  • Otherwise, if 𝑝-value > 𝛼: result is not statistically significant at the 100𝛼100𝛼%% level β†’ Fail to reject H0H_0

Decision Criteria Summary

Method

𝜢 = 𝟎. πŸŽπŸ“

𝜢 = 𝟎. 𝟎𝟏

𝑝-value

Reject H0H_0 if 𝑝-value < 𝛼

Fail to reject if 𝑝-value > 𝛼

Rejection region

Reject H0H_0 if test statistic within region

Fail to reject if test statistic not in region

Confidence interval

Reject H0H_0 if value not within CI

Fail to reject if value within CI

Population Mean

Large-Sample Test

  • To test the hypothesis H<em>0:ΞΌ=ΞΌ</em>0H<em>0: ΞΌ = ΞΌ</em>0 with a large sample (n > 30):

    1. Test statistic:
      z="(xβˆ’ΞΌ0)/(Οƒ/√n)"z = "(x - ΞΌ_0) / (Οƒ / √n)"

    2. Null distribution: Sample Mean ~ Normal(μ0μ_0, σ2/nσ^2/n)

Small-Sample Test

  • Use when the sample is small and the population standard deviation (Οƒ) is unknown.

    1. Test statistic:
      t="(xβˆ’ΞΌ0)/(s/√n)"t = "(x - ΞΌ_0) / (s / √n)"

    2. Null distribution: t-distribution with (n-1) degrees of freedom.

Hypothesis Testing Errors

  • There are two possibilities (truth):

    • The supplement does not improve the average grade

    • The supplement improves the average grade

  • Based on your trial, you will make one of two decisions:

    • Don’t develop the supplement because you don’t believe it improves grades

    • Bring the supplement to the market because you believe it improves grades

  • However, there is a chance that the decision could be the wrong one

    • With 2 possibilities (truth) and 2 decisions, we have 4 possible outcomes

Hypothesis Testing Errors

Ineffective

Effective

Don't Sell

βœ“

βœ—

Sell

βœ—

βœ“

Type I and Type II errors

Decision

Truth H0H_0

Truth H1H_1

Fail to Reject H0H_0

Correct +ve

Type II Error 𝛽

Reject H0H_0

Type I Error 𝛼

Correct +ve

Type I Error

  • Rejecting the null hypothesis when it is true (False Positive).

  • Ξ± is the probability of making a Type I error.

Type II Error

  • Failing to reject the null hypothesis when it is false (False Negative).

  • 𝛽 is the probability of making a Type II error.

Reducing Error

  • Decreasing the probability of Type I error (𝛼) increases the probability of Type II error (𝛽), and vice versa.

  • The only way to decrease both simultaneously is to increase the sample size.