Study Notes on Hypothesis Testing

Hypothesis Testing in Statistics

Understanding Hypothesis Testing

  • Definition: Hypothesis testing is a statistical method used to make inferences or draw conclusions about a population based on sample data.

Key Terms:
  • Sample Data: A subset of a population that is collected for analysis.

  • Population Parameter: A value that represents a characteristic of an entire population (e.g., population mean, population standard deviation).

  • Sample Mean ( ar{x} ): The average of the sample data being analyzed.

  • Standard Deviation ( ext{σ} ): Measures the dispersion or variation of a set of values in a population.

Common Errors in Estimation

  • Marginal Error: The largest error that can occur when estimating a population from a sample.

    • Example: If the sample mean is significantly different from the expected population parameter, it may indicate a margin of error.

  • Point Estimate: Refers to using a single sample to estimate a population parameter.

    • It represents the best guess but does not provide a range of confidence about the estimate.

Steps in Hypothesis Testing

  1. State the Null Hypothesis ( H_0 ): This is a presumption that there is no effect or no difference, and it serves as a starting point for statistical testing.

  2. State the Alternative Hypothesis ( H_1 ): This hypothesis represents what we aim to prove, indicating that there is an effect or a difference.

  3. Determine the Sample Statistics: Collect sample data and calculate statistics such as sample mean ( ar{x} ) and sample size ( n ).

  4. Perform the Statistical Test: Using the data, apply a statistical test (like a z-test or t-test) based on the scenario and population standard deviation (if known).

  5. Make a Decision: Compare the p-value obtained from the statistical test to a significance level (usually ext{α} = 0.05 ). Either reject H_0 or fail to reject H_0 .

  6. Conclusion: Report the findings in the context of the original research question.

Simplifying Concepts

  • Alternative Way to Phrase: Instead of "is my sample data," one might say, "Does the sample provide strong evidence for the alternative hypothesis?" This emphasizes looking for evidence rather than simply comparing values.

Confidence Intervals for Means Using T-Distribution

  • Constructed a 95% confidence interval for average student sleep hours using a sample of 50 students

  • Sample mean: 7.98 hours, standard deviation: 1.19 hours

  • Used t-distribution with 49 degrees of freedom, t-star value: 2.010

  • Final interval: 6.8 to 7.5 hours of sleep per night

  • Formula approach: must write out the formula and show how t-star was found on exams

  • Calculator method: TInterval function in Stats menu allows using either raw data or summary statistics

Interpreting Confidence Intervals

  • Correct interpretation: 95% confident that the population mean falls within the interval

  • Common misconceptions addressed:

    • The interval does NOT capture 95% of individual data values

    • The interval does NOT predict where 95% of future sample means will fall

  • If repeating the procedure 1000 times, about 950 intervals would contain the true population mean

  • Used baseball player salary example to illustrate these concepts

Margin of Error and Sample Size Calculation

  • Margin of error for the sleep study interval: 0.34 hours

  • Calculated as: (upper bound - lower bound) / 2

  • To find required sample size for a given margin of error, use formula: n = (t-star × s / margin of error)²

  • Must round up to next whole number

  • Example: to achieve 90% confidence with 0.25 margin of error, need sample size of 92 students

  • Use standard deviation from previous sample or range/6 estimate

Introduction to Hypothesis Testing (Not on Exam)

  • Section 5.4 explicitly stated as not covered on the exam

  • General framework: null hypothesis (H₀: μ = μ₀) vs alternative hypothesis (H₁: μ ≠ μ₀, μ > μ₀, or μ < μ₀)

  • Test statistic: t = (x̄ - μ₀) / (s / √n)

  • P-value found using tcdf function with n-1 degrees of freedom

  • Example problem: testing if listening to classical music reduces maze completion time from population mean of 40 seconds

  • Sample of 100 students, mean time 39.1 seconds, standard deviation 4 seconds

Exam Reminders

  • Must show formulas and calculations when constructing confidence intervals

  • Can use calculator functions but must specify which function and what values were used

  • For homework, may need to enter answers to 4 decimal places - use Vars menu to access full precision

  • Formulas will be provided on exam cheat sheet