Hypothesis Testing in Statistics
Research Questions and Data Analysis
- Use a SAMPLE to draw conclusions about a TARGET POPULATION.
- Differentiate between population and sample using appropriate notation.
Population Notation Refresher
- Sample notation is used for samples; Population notation for populations.
- Sample Statistics estimate Population Parameters:
- Mean: yˉ (Sample Mean) vs. <br/>ν (Population Mean)
- Median: ildey (Sample Median) vs. ilde<br/>ν (Population Median)
- Standard Deviation: s (Sample SD) vs. <br/>ν (Population SD)
- Variance: s2 (Sample Variance) vs. <br/>ν2 (Population Variance)
Hypothesis Testing Overview
- Involves a series of formal steps to answer business and research questions.
- Example question: "Is the Outlier Oven pizza shop selling on average 200 pizzas per day or following an incentive scheme?"
- Sample mean found: 204.6.
- Population variance is an unknown variable.
Null Hypothesis (H0)
- A statement that a population parameter has a known value.
- In this case: H0:<br/>ν=200 (mean daily sales is equal to 200).
Alternative Hypothesis (H1)
- Opposite of the null hypothesis; it claims the population parameter is different from the null.
- For the pizza question: H1:<br/>ν<br/>=200.
Null Distribution
- The distribution of the test statistic is known under the null hypothesis.
- The data must be normally distributed with a known variance for accurate results.
Test Statistic
- Used to determine how far the sample statistic is from the null hypothesis.
- The test statistic observed in the example: z=3 (for the sample mean).
p-Value
- Measures the probability of obtaining a result as extreme or more extreme than the observed, under the null hypothesis.
- p-value represents the shaded area under the curve in relation to the test statistic.
Significant Result
- At a significance level of 5% (B1 = 0.05):
- If p < 0.05, reject H<em>0 (evidence against H</em>0).
Non-Significant Result
- If p > 0.05, fail to reject H<em>0 (no evidence against H</em>0).
Conclusion for Outlier Oven
- Tested: H<em>0:ν=200 against H</em>1:<br/>ν<br/>=200.
- Test statistic: t=4.28; p-value = 0 (less than 0.05).
- Conclusion: Evidence suggests Outlier Oven sells significantly more than 200 pizzas.
Hypothesis Testing Steps (H.A.T.P.D.C.)
- Hypotheses: State H<em>0 and H</em>1.
- Assumptions: Check assumptions of the test.
- Test Statistic: Calculate the test statistic.
- p-value: Obtain p-value.
- Decision: Reject or not reject H0 based on the p-value.
- Conclusion: Conclude regarding the original research question.
Statistical Tests Overview
One Sample z-Test
- Use when:
- One numeric variable exists.
- Population standard deviation is known.
One Sample t-Test
- Criteria:
- Single numeric variable.
- Population standard deviation is unknown.
- Compare sample mean to hypothesized mean.
Assumptions for One Sample t-Test
- Scores are numeric.
- Observations are independent.
- Appropriately normally distributed (n >= 25 can apply CLT, normality assumed otherwise).
Conclusion for One Sample t-Test
- p-value > 0.05: Do not reject H0.
- p-value < 0.05: Reject H0.
- Mention the business question in conclusions.
Two Sample t-Test
- For independent samples with two means:
- Null: H<em>0:ν</em>1=<br/>ν2.
- Alternative: H<em>1:ν</em>1<br/>=<br/>ν2.
Assumptions for Two Sample t-Test
- Observations are independent.
- Observations are moderately normally distributed.
- Handle equal/unequal variance differently if applicable.
Conclusion for Two Sample t-Test
- Same as above: p-value conditioning (less than or greater than 0.05) with reference to the business question.
Paired Sample t-Test
- Calculate differences for samples of matched pairs:
- Null: H<em>0:ν</em>d=0.
- Alternative: H<em>1:ν</em>d<br/>=0.
Chi-Square Goodness of Fit Test
- Used for categorical variables.
- State null and alternative hypotheses regarding proportions.
Example of Chi-Square Goodness of Fit
- Assess patterns in sick leave.
- Use proportions to evaluate conditions in the null hypothesis.
- Test validity with expected counts.