Hypothesis Testing in Statistics

Research Questions and Data Analysis

  • Use a SAMPLE to draw conclusions about a TARGET POPULATION.
  • Differentiate between population and sample using appropriate notation.

Population Notation Refresher

  • Sample notation is used for samples; Population notation for populations.
  • Sample Statistics estimate Population Parameters:
    • Mean: yˉ\bar{y} (Sample Mean) vs. <br/>ν<br />\nu (Population Mean)
    • Median: ildeyilde{y} (Sample Median) vs. ilde<br/>νilde{<br />\nu} (Population Median)
    • Standard Deviation: ss (Sample SD) vs. <br/>ν<br />\nu (Population SD)
    • Variance: s2s^2 (Sample Variance) vs. <br/>ν2<br />\nu^2 (Population Variance)

Hypothesis Testing Overview

  • Involves a series of formal steps to answer business and research questions.
  • Example question: "Is the Outlier Oven pizza shop selling on average 200 pizzas per day or following an incentive scheme?"
    • Sample mean found: 204.6.
    • Population variance is an unknown variable.

Null Hypothesis (H0)

  • A statement that a population parameter has a known value.
  • In this case: H0:<br/>ν=200H_0: <br />\nu = 200 (mean daily sales is equal to 200).

Alternative Hypothesis (H1)

  • Opposite of the null hypothesis; it claims the population parameter is different from the null.
  • For the pizza question: H1:<br/>ν<br/>200H_1: <br />\nu <br />\neq 200.

Null Distribution

  • The distribution of the test statistic is known under the null hypothesis.
  • The data must be normally distributed with a known variance for accurate results.

Test Statistic

  • Used to determine how far the sample statistic is from the null hypothesis.
  • The test statistic observed in the example: z=3z = 3 (for the sample mean).

p-Value

  • Measures the probability of obtaining a result as extreme or more extreme than the observed, under the null hypothesis.
  • p-value represents the shaded area under the curve in relation to the test statistic.

Significant Result

  • At a significance level of 5% (B1 = 0.05):
    • If p < 0.05, reject H<em>0H<em>0 (evidence against H</em>0H</em>0).

Non-Significant Result

  • If p > 0.05, fail to reject H<em>0H<em>0 (no evidence against H</em>0H</em>0).

Conclusion for Outlier Oven

  • Tested: H<em>0:ν=200H<em>0: \nu = 200 against H</em>1:<br/>ν<br/>200H</em>1: <br />\nu <br />\neq 200.
  • Test statistic: t=4.28t = 4.28; p-value = 0 (less than 0.05).
  • Conclusion: Evidence suggests Outlier Oven sells significantly more than 200 pizzas.

Hypothesis Testing Steps (H.A.T.P.D.C.)

  1. Hypotheses: State H<em>0H<em>0 and H</em>1H</em>1.
  2. Assumptions: Check assumptions of the test.
  3. Test Statistic: Calculate the test statistic.
  4. p-value: Obtain p-value.
  5. Decision: Reject or not reject H0H_0 based on the p-value.
  6. Conclusion: Conclude regarding the original research question.

Statistical Tests Overview

One Sample z-Test
  • Use when:
    • One numeric variable exists.
    • Population standard deviation is known.
One Sample t-Test
  • Criteria:
    • Single numeric variable.
    • Population standard deviation is unknown.
    • Compare sample mean to hypothesized mean.
Assumptions for One Sample t-Test
  • Scores are numeric.
  • Observations are independent.
  • Appropriately normally distributed (n >= 25 can apply CLT, normality assumed otherwise).

Conclusion for One Sample t-Test

  • p-value > 0.05: Do not reject H0H_0.
  • p-value < 0.05: Reject H0H_0.
  • Mention the business question in conclusions.
Two Sample t-Test
  • For independent samples with two means:
    • Null: H<em>0:ν</em>1=<br/>ν2H<em>0: \nu</em>1 = <br />\nu_2.
    • Alternative: H<em>1:ν</em>1<br/><br/>ν2H<em>1: \nu</em>1 <br />\neq <br />\nu_2.
Assumptions for Two Sample t-Test
  • Observations are independent.
  • Observations are moderately normally distributed.
  • Handle equal/unequal variance differently if applicable.
Conclusion for Two Sample t-Test
  • Same as above: p-value conditioning (less than or greater than 0.05) with reference to the business question.
Paired Sample t-Test
  • Calculate differences for samples of matched pairs:
    • Null: H<em>0:ν</em>d=0H<em>0: \nu</em>d = 0.
    • Alternative: H<em>1:ν</em>d<br/>0H<em>1: \nu</em>d <br />\neq 0.
Chi-Square Goodness of Fit Test
  • Used for categorical variables.
  • State null and alternative hypotheses regarding proportions.

Example of Chi-Square Goodness of Fit

  • Assess patterns in sick leave.
  • Use proportions to evaluate conditions in the null hypothesis.
  • Test validity with expected counts.