Hypothesis Testing Lecture
Hypothesis Testing and Statistical Inference
Overview of Hypothesis Testing
Definition: Hypothesis testing is a statistical method used to make inferences or draw conclusions about population parameters based on sample data.
Key Questions:
Is it likely that our observed results could have occurred by chance?
How confident can we be that our sample estimate reflects the true population parameter?
Fundamentals of Hypothesis Testing
Sample vs Population: In hypothesis testing, we work with samples from a population; thus, we do not know population parameters.
Statistical Hypothesis: A statement or assumption regarding a population parameter that we aim to test.
Example: It could be about the mean, proportion, or regression coefficients.
Decision-Making: Hypothesis testing involves deciding whether or not to reject a statistical hypothesis based on sample data.
Formulation of Hypotheses
Researcher Statements: The first step in hypothesis testing involves stating the null and alternative hypotheses clearly.
Null Hypothesis (H0): The hypothesis that indicates no effect or status quo, which the researcher does not expect to see.
Example 1: If expecting a positive coefficient:
H0: β ≤ 0 (null: no positive effect)
HA: β > 0 (alternative: positive effect exists)
Example 2: If expecting a negative coefficient:
H0: β ≥ 0 (null: no negative effect)
HA: β < 0 (alternative: negative effect exists)
Types of Hypotheses
One-Sided vs Two-Sided Tests
One-Sided Tests: Tests that evaluate the possibility of the relationship in one direction only.
Right-Sided:
H0: β ≤ 0
HA: β > 0
Left-Sided:
H0: β ≥ 0
HA: β < 0
Two-Sided Tests: Tests that evaluate the possibility of a relationship in both directions.
Formulation:
H0: β = 0
HA: β ≠ 0
Testing Techniques
Typical Testing Techniques in Econometrics:
Methods for Testing Hypotheses:
t-test
p-value
Confidence interval
Decision Rules in Hypothesis Testing
Decision Rule: Method to conclude whether to reject the null hypothesis by comparing a sample statistic to a critical value.
Critical Value: A threshold that separates acceptance and rejection regions in hypothesis testing, derived from statistical tables based on the test statistic used.
Regions:
Acceptance Region: Where we fail to reject H0.
Rejection Region: Where we reject H0 based on the critical value.
One-Sided Test of β
Concept: It assesses whether the estimated coefficient from sample data falls within the defined acceptance or rejection regions.
Two-Sided Test of β
Concept: Similar to one-sided tests but allows for both positive and negative values, testing the hypothesis against zero.
t-Test
Purpose: The t-test is employed to test hypotheses regarding individual slope coefficients in regression analysis.
Conditions for Usage:
The stochastic error term (ϵ) should be normally distributed.
The variance of the distribution must be estimated.
Formula: For typical multiple regression equations:
Yi = β0 + β1X{1i} + β2X{2i} + … + βKX{Ki} + ϵ_i
To calculate the t-statistic for the kth coefficient:
tk = \frac{(\hat{βk} - β{H0})}{SE(\hat{βk})} where:
\hat{β_k} = estimated regression coefficient for the kth variable
β_{H0} = hypothesized value of the coefficient
SE(\hat{βk}) = standard error of \hat{βk}
Example of t-Test
For a two-sided test where H0: β = 0, the t-statistic can be simplified to:
tk = \frac{(\hat{βk} - 0)}{SE(\hat{βk})} = \frac{\hat{βk}}{SE(\hat{β_k})}
Level of Significance (α)
Definition: The maximum probability of committing a Type I Error (rejecting a true null hypothesis).
Selection: Should be chosen prior to testing; common levels are:
10% (0.10)
5% (0.05)
1% (0.01)
Implications: Lowering α increases the chance of a Type II error (failing to reject a false null hypothesis).
Types of Errors in Hypothesis Testing
Type I Error: Rejecting H0 when it is true (finding an effect that does not exist).
Type II Error: Failing to reject H0 when it is false (not finding an effect that exists).
Confidence Level (1 − α)
Definition: Represents the likelihood that the confidence interval contains the true parameter.
For example, at a significance level of 5% (α = 0.05), the confidence level equals 95% (1 - 0.05).
Decision Rule for t-Test
To make decisions using the t-test:
Compare the calculated t-statistic (tk) with the critical t-value (tc).
Criteria for decision:
Reject H0 if |tk| > tc and if the sign of t_k aligns with HA.
Fail to reject H0 otherwise.
Limitations of t-Test
Does not test theoretical validity of the regression model.
Does not indicate the importance of coefficients—statistically significant coefficients do not imply practical importance.
Not meant for testing entire populations; increasing N reduces standard error, thus inflating t-scores.
p-value in Hypothesis Testing
Definition: The p-value represents the smallest significance level at which the null hypothesis can be rejected.
Decision Rule: Reject H0 if the p-value of the estimated coefficient is less than α.
Important Note: Statistical software reports p-values typically for two-sided hypotheses, regardless of the nature of the test performed.
p-value Example with Stata Regression Results
Consider a regression Pricet = β0 + β1GDPt + ϵt with a reported p-value of 0.000 for βGDP:
Since this p-value is less than 0.05, we reject H0 at the 5% significance level, indicating significance.
Confidence Intervals
Purpose: Another method of hypothesis testing that provides a range of plausible values for a parameter.
Formula: For the confidence interval of β_k:
CI(βk) = \hat{βk} \pm tc \times SE(\hat{βk}) where t_c is the critical value for the chosen significance level.
Decision Rule: If the hypothesized value lies within this interval, H0 cannot be rejected.
Confidence Level Example
For a 90% CI for β_1:
Calculation: 1.288 \pm (1.699 * 0.543)
Resulting bounds:
UCB = 1.288 + 0.923 = 2.211
LCB = 1.288 − 0.923 = 0.365
Interpretation: There is a 90% chance the true β_1 is between 0.365 and 2.211.
Hypothesis Testing Using Confidence Intervals
For H0: β1 = 0 and HA: β1 \neq 0: We reject H0 at the 10% significance level because 0 is outside the interval.
For H0: β1 = 1 and HA: β1 \neq 1: We cannot reject H0 at the 10% significance level since 1 falls within the interval.
Summary of Hypothesis Testing Steps using t-Test
Set up the null (H0) and alternative hypotheses (HA).
Choose a level of significance (α) and determine the critical t-value (t_c).
Run the regression to obtain the t-statistic (t_k).
Apply the decision rule to decide whether to reject or not reject H0 based on tk vs. tc.
Regression Example
Model Description
Dependent Variable: Y (gross sales of Woody’s restaurant)
Independent Variables:
N: Competition (number of direct competitions within a two-mile radius)
P: Population (number of people within a three-mile radius)
I: Income (average household income in the area)
Sample Size: N = 33
Degrees of Freedom in Hypothesis Testing
Definition: Degrees of freedom (DF) is the difference between the number of observations (N) and the number of estimated coefficients (K + intercept).
Formula: DF = N - K - 1
Example: For a model with 3 variables and 120 observations, DF = 120 - 3 - 1 = 116
Implications: DF indicate how many observations can vary independently in estimations; lower DF can lead to imprecise parameter estimates.