Statistics (Section E) 

Introduction to Hypothesis Testing

Hypothesis testing is the process of making inferences about population parameters based on sample statistics. It's a fundamental concept in statistical analysis that allows us to make decisions based on limited data.

Key Concepts in Hypothesis Testing

1. Null and Alternative Hypotheses

  • Null Hypothesis (H₀): A statement that there is no effect, difference, or relationship. It represents the status quo.

  • Alternative Hypothesis (H₁): A statement that contradicts the null hypothesis, suggesting that there is an effect, difference, or relationship.

2. Significance Level (α)

  • The significance level is the probability of rejecting the null hypothesis when it is true (Type I error).

  • Common values are 0.05 (5%), 0.01 (1%), and 0.10 (10%).

  • Denoted by α (alpha).

3. Test Statistics

Different types of tests use different test statistics:

  • Z-test: For normally distributed data with known population variance

  • T-test: For normally distributed data with unknown population variance

  • Chi-squared test: For categorical data

4. Critical Regions and Critical Values

  • Critical Region: The set of values for the test statistic that leads to rejection of the null hypothesis.

  • Critical Value: The value that separates the critical region from the acceptance region.

5. P-value

  • The probability of obtaining a test statistic at least as extreme as the one observed, assuming the null hypothesis is true.

  • If p-value < significance level, reject H₀.

6. Types of Errors

  • Type I Error: Rejecting H₀ when it is true (false positive)

  • Type II Error: Not rejecting H₀ when it is false (false negative)

Specific Hypothesis Tests

1. One-Sample Z-Test

Used when testing a claim about a population mean when the population standard deviation is known.

Formula: z = (x̄ - μ) / (σ/√n)

Where:

  • x̄ = sample mean

  • μ = hypothesized population mean

  • σ = population standard deviation

  • n = sample size

2. One-Sample T-Test

Used when testing a claim about a population mean when the population standard deviation is unknown.

Formula: t = (x̄ - μ) / (s/√n)

Where:

  • s = sample standard deviation

3. Two-Sample Z-Test

Used to compare two population means when the population standard deviations are known.

Formula: z = (x̄₁ - x̄₂ - (μ₁ - μ₂)) / √(σ₁²/n₁ + σ₂²/n₂)

4. Two-Sample T-Test

Used to compare two population means when the population standard deviations are unknown.

Formula: t = (x̄₁ - x̄₂) / √(s₁²/n₁ + s₂²/n₂)

5. Paired T-Test

Used when data points come in pairs, and we're testing for a difference between the pairs.

Formula: t = d̄ / (s_d/√n)

Where:

  • d̄ = mean of the differences

  • s_d = standard deviation of the differences

6. Chi-Square Goodness-of-Fit Test

Used to determine whether a categorical variable follows a hypothesized distribution.

Formula: χ² = Σ [(Observed - Expected)² / Expected]

7. Chi-Square Test of Independence

Used to determine whether two categorical variables are independent.

Formula: χ² = Σ [(Observed - Expected)² / Expected]

Where Expected = (row total × column total) / grand total

Hypothesis Testing Procedure

  1. State the hypotheses:

    • Null hypothesis (H₀)

    • Alternative hypothesis (H₁)

  2. Choose the significance level (α).

  3. Select the appropriate test statistic.

  4. Determine the critical region or compute the p-value.

  5. Make a decision:

    • If test statistic falls in the critical region (or p-value < α), reject H₀.

    • Otherwise, do not reject H₀.

  6. State the conclusion in context of the original problem.

Common Pitfalls and Tips

  • Remember that "not rejecting H₀" is not the same as "accepting H₀"

  • Always state your conclusion in the context of the original problem

  • Be careful about the directionality of your tests (one-tailed vs. two-tailed)

  • Ensure your data meets the assumptions of the test you're using

  • Sample size matters - larger samples give more reliable results

Example Problems

Example 1: One-Sample Z-Test

Problem: A company claims that the mean lifetime of their light bulbs is 1000 hours. A random sample of 36 bulbs has a mean lifetime of 970 hours. The population standard deviation is known to be 90 hours. Test the claim at a 5% significance level.

Solution:

  1. H₀: μ = 1000 hours H₁: μ ≠ 1000 hours

  2. α = 0.05 (two-tailed test)

  3. Test statistic: z = (970 - 1000) / (90/√36) = -30/15 = -2

  4. Critical values: z = ±1.96 (for a two-tailed test at α = 0.05)

  5. Decision: Since -2 < -1.96, the test statistic falls in the critical region.

  6. Conclusion: Reject H₀. There is sufficient evidence to suggest that the mean lifetime of the light bulbs is not 1000 hours.

Example 2: Chi-Square Test of Independence

Problem: A survey asks 200 people whether they prefer tea or coffee, and whether they work in the morning or evening. The results are:

Tea

Coffee

Total

Morning

45

65

110

Evening

55

35

90

Total

100

100

200

Test at a 5% significance level whether drink preference is independent of work schedule.

Solution:

  1. H₀: Drink preference is independent of work schedule H₁: Drink preference is dependent on work schedule

  2. α = 0.05

  3. Calculate expected values: E(Morning, Tea) = (110 × 100) / 200 = 55 E(Morning, Coffee) = (110 × 100) / 200 = 55 E(Evening, Tea) = (90 × 100) / 200 = 45 E(Evening, Coffee) = (90 × 100) / 200 = 45

  4. Calculate χ²: χ² = (45-55)²/55 + (65-55)²/55 + (55-45)²/45 + (35-45)²/45 χ² = 1.82 + 1.82 + 2.22 + 2.22 = 8.08

  5. Degrees of freedom = (rows-1)(columns-1) = (2-1)(2-1) = 1 Critical value at α = 0.05 with df = 1 is 3.84

  6. Decision: Since 8.08 > 3.84, reject H₀

  7. Conclusion: There is sufficient evidence to suggest that drink preference is dependent on work schedule.

Formula Sheet

Test

Formula

When to Use

Z-test (one sample)

z = (x̄ - μ) / (σ/√n)

Known σ, testing μ

T-test (one sample)

t = (x̄ - μ) / (s/√n)

Unknown σ, testing μ

Z-test (two sample)

z = (x̄₁ - x̄₂) / √(σ₁²/n₁ + σ₂²/n₂)

Known σ₁ and σ₂, comparing means

T-test (two sample)

t = (x̄₁ - x̄₂) / √(s₁²/n₁ + s₂²/n₂)

Unknown σ₁ and σ₂, comparing means

Paired T-test

t = d̄ / (s_d/√n)

Paired observations

Chi-Square

χ² = Σ [(O - E)² / E]

Categorical data