1/26
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
If we know that the OLS estimators are unbiased, can we immediately conclude that they
are consistent?
No we cannot conlcude that they are consistent as they may exhibit heteroskedasticity or high
Mizon-Richard Test
Given two Nonnested Models:
We Construct a comprehensive model that includes each model as a special case
Then we test each of the restriction that led to each of the models initially.
i.e. testing the significance of the coefficients of model 1, and then model 2.
Davidsonn-Mackinnon Test
Given two non-nested models:
If the zero conditional mean assumption (E(u|x1, x2) = 0 holds for model 1, the filtered values from model 2 should be insignificant when added to the first model.
We estimate the 2nd model to obtain the residuals
Then the Davidson-Mackinnon Test is obtained from the statistics on yhat in the auxiliary regression.
Since yhat is just a nonlinear function of x1 and x2, they should be insignificant if model 1 is the correct conditional mean model.
Thus, a significant t stat is a rejection of the first model.
Must do the reverse since test is not symmetric
Does Serial correlation invalidate goodness of fit measures such as R2?
No. In this case, the variation in the dependent variable that is explained by the model is calculated independently of whether the errors are serially correlated.
Explain the difference between a forecast and a predicted value?
A predicted value is an estimate of Y within the sample using the observed regressors to estimate the model.
A forecast is an estimate of Y outside of the sample using the new or future values of the regressors to estimate the model
Explain intuitively (without derivations) why the variance of a forecast increases with the
forecast horizon
You rely on less information as forecasting over a short horizon relies more on actual realised data compared to the longer horizon, which is based on previous forecasts
Each forecast depends on the previous forecast. Since each forecast has its own error, the effects of the errors compound as more forecasts are made.
Greater amount of uncertainty as we go further in time.
Explain why we typically only use lagged covariates in our regression models when creating
forecasts
We use lagged covariates in forecasting because only past and current information is available at the time the forecast is made.
Using contemporaneous or future variables would require information or the introduction of additional forecasting error, making the model impractical and potentially biased
Provided you find evidence of heteroskedasticity, what are the two approaches you
can use to address this? Name an advantage and disadvantage of each approach.
Robust Standard Errors
Easy to implement, you need not know the form of heteroskeasticity, inference becomes valid
Does not improve efficiency, coefficient estimates are not optimal, it doesn’t fix heteroskedasticity, just corrects inference
General/ Weighted Least Squares
Produces efficient estimators (BLUE restored), can improve estimates/ inference
requires knowing the form of heteroskedasticity, more complex to implement
Unbiasedness Means:
On average, across repeated samples, the OLS estimators equal the true population parameters.
If the homoskedasticity assumption also holds, what does this imply about OLS?
Why is the result important and what does it tell us about the variance of the OLS
estimates? Finally, what is this set of assumptions called?
implies that OLS estimators are BLUE (Best Linear Unbiased Estimators)
Tells us the variance of the estimates, conditional on the explanatory variables, is constant.
Set of assumptions is called the Gauss-Markov assumption, importance since it us that among all linear unbiased estimators, OLS has the smallest variance.
Once we add the normality assumption, we say that that the Classic Linear Model
(CLM) assumptions hold. When the CLM assumptions hold, what do we know about
the variance of the OLS estimates
We know that OLS estimators are normally distributed in any sample size.
Allows us to do inference.
distribution of the estimators is normal not just in large samples
Heteroskedasticity consequences
HET does not cause bias or inconsistency. OLS estimators remain unbiased and consistent under HET, as the zero condiitonal mean assumption is still satisfied.
Standard errors and Confidence intervals are no longer reliable since their formulas are incorrect and lead to incorrect inferences.
“OLS is asymptotically efficient”. Explain what this means. Why is this important
Means that, as the sample size grows indefinitely, the OLS estimator has the smallest variance among all consistent and asymptotically normal estimators in its class.
Important because it tells us that in large sample, even if we cannot guarantee OLS is BLUE (e.g., due to HET), we know its still a highly desirable estimator that will provide estimates as precisely as any other competing estimator we may consider.
Explain the difference between the zero conditional mean assumption and the zero mean and zero correlation assumptions. Write down the formal expression for these assumptions and explain which assumption is easier to satisfied.
Zero conditional mean implies tha tthe error term has a mean of zero for any given value of the independent variables. Required for OLS to be unbiased
Zero and and Zero correlation requires that the error term is uncorrelated with each independent variable. Sufficient for OLS to be consistent
Zero mean and Zero correlation assumption is easier to satisfy since it requires no linear relationship between x and u. ZCM rules out all types of relationships (linear and non-linear) and is thus more restrictive.
Explain the difference between omitted variable bias and functional form misspecification.
What implications do each have for the unbiasedness and/or consistency of the OLS estimators? Give one example of omitted variable bias and functional form misspecification
Omitted variable bias: refers to leaving out relevant explanatory variables that are correlated with the included variables.
Implication: Violates the ZCM assumption. This leads ot bias and inconsistency in the OLS estimators for the included variables that are correlated with the omitted variable
Example: Estimating the return to education while omitting the variable “ability”, which is correlated with both education and wages
Functional Form Misspecificaiton: Refers to having an incorrect model structure, such as having the wrong functional form of the dependent variable or omitting polynomial or interaction terms.
Implication: Generally violates the ZCM assumption since if the true model contains a different form of a variable, then the error term in the mispeciefied model will include that term, which makes E(u|x) ≠0. Typically leading ot bias and inconsistency withihn the OLS estimators.
Example: Estimating a linear model of wage on experience when the true relationship was quadratic.
Explain the logic behind included lagged dependent variables as a proxy and explain why they are a popular way to deal with omitted variable bias.
A lagged dependent variable acts as a proxy for all past and unobserved factors that influence the current value of y. Including it means we control for the historical context and many omitted variables that are captured int he unit’s past performance.
Popular since they are often readily available in panel or time-series data, and provide a simple, powerful way to account for a complex set of unobserved heterogeneity, reducing ovb.
When will including a lagged dependent variable not address functional form misspecification caused by omitted variables?
This occurs if the omitted variable bias stems from a factor that is contemporaneous and time varying.
Say the omitted variable is a change that occurs in the same period as the dependent variable and is correlated with the independent variable of interest, a lagged DV wont capture this and bias remains.
The White test makes use of a weaker assumption of homoskedasticity. State this assumption.
The white test makes the assumption that the model is correctly specified.
This means it assumed there is no omitted variable bias and no functional form misspecification.
Hence one may find if this assumption is violated that the test rejects the null hypothesis due to misspecification.
What is the difference between standard version of the White test and the special case of the White test? What insight does the special case of the White test employ? What is the practical significance of this difference?
Standard white test uses an auxiliary regression of the squared residuals on all the independent variables, their squares, cross products.
Special case uses an aux regression of the squared residuals on the fitted values and the squared fitted values.
Since the fitted are a linear combination of all the regressors, the squared fitted values are a function of all the squares and cross-products. Reduces the number of regressors and saves degrees of freedom.
Thus its practical when you have many independent variables since the full test requires all regressors to be included and due to degrees of freedom may become infeasible.
Two Pieces of Evidence for an adjustment for heteroskedasticity in STATA regression output?
Coefficients are identical whereas for methods like WLS the coefficients would have changed
Standard errors have change
This case is more indicative of Robust (White) standard errors.
When using WLS to correct for heteroskedasticity, we need to estimate the form of heteroskedasticity, h(x). Name two consequences that occur when h(x) is incorrectly specified.
Bias an inconsistency:
Unlike OLS under HET which remains consistent, a misspecified weighting scheme can lead to estimates that do not converge to the true population parameter as sample size increases
Loss of Efficiency:
WLS estimator is no longer the Best Linear Unbiased Estimator. Estimator will be less efficient than correctly specified WLS estimator.
What are the requirements for a good proxy variable?
We require that the proxy variable is uncorrelated with the dependent variable when controlling for the original x variable.
Secondly, we require that the proxy variable is related to the original variable as well as the original x variable is not correlated with the other x variables when controlling for the proxy variable.
You suspect that the value of the dependent variable is misreported in the dataset you are working with. Assuming your suspicions are correct, should you be concerned about the consistency of the OLS estimators? Why / why not?
No, I would not be concerned about the consistency of the OLS estimators if the measurement occurs only in the dependent variables
Measurement error in the dependent variable is absorbed into the error term.
As long as this measurement error is uncorrelated with the independent variables, the zero conditional mean assumption remains satisfied.
OLS estimators remain unbiased and consistent.
Only consequence is that standard errors may increase, but the coefficient estimates themselves are not biased.
What if the measurement error actually occurred in the Income variable. When is OLS consistent? When is it inconsistent?
OLS is consistent when:
Measurement error is classical, uncorrelated with the true value of income and uncorrelated with ohter independnet variables.
OLS is inconsistent when:
virtually all cases where an independent variable is measured with error, because Cov(x1, e1) ≠0, creating correlation between the observed variable and the error term.
Omitted variable bias is a major concern to the zero conditional mean assumption in cross-sectional econometrics. Explain, through the use of an example, how an omitted variable can bias the estimated coefficients in OLS. Write down an expression which quantifies this bias and explain each component.
suppose the true model of wages is: β0 + β1educ + β2ability + u
where ability is unobserved. If we omit ability, estimate β0 + βeduc + v, v=β0 +β2ability + u
if we suspect education is correlated with ability Cov(educ, abilit) > 0. This violates the zcm assumption because the omitted variable is correlated with the included regressor.
Bias of the OLS estimator E(β1) = β1 + β2δ1
hence the OLS overstates the return to education
Explain the meaning of the terms asymptotic normality and asymptotic efficiency as they apply to OLS
Asymptotic Normality
Means that as the sample size n increases indefinitely, the sampling distribution of the OLS estimator βj converses to a normal distribution.
Important as it allows us to before hypothesis testing and construct CI using the std normal distribution, even when the error term is not normally distributed.
Asymptotic Efficiency
Means that among all estimators that are consistent and asymptotic normal, OLS has the smallest asymptotic variance. I.e. no other consistent estimator has a sampling distribution that collapses more tightly around the true parameter as n tends to infinity
Important as it establishes OLS is the most precise estimator in large samples, under the GM assumptions (MLR.1 - MLR.5). Even if the errors are not normal, OLS remain the best choice among consistent estimators in terms of asymptotic precision.
You are concerned about functional form misspecification in a cross sectional regression - explain two potential sources of functional form misspecification and name a test we can use to test for each type of misspecification.
Omitted nonlinear terms
True relationship between y and x is nonlinear, but the model is specified as linear.
RESET Test - test whether powers o fitted values have explanatory power when added to the model
Incorrect Functional Form
A variable is included levels when it should be in logs for example. This misspecified the marginal effect and can lead to biased estimates if the error term becomes correlated with regressors.
Davidosn-Mackinnon Test, Mizon-Richard test