1/30
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
panel data
contains observations on multiple entities where each entity is observed at two or more points in time; solves OVB
Why is panel data useful?
we can control for factors that (1) vary across entities but to not vary over time and (2) could cause OVB
entity fixed variables (Zi)
omitted variables that vary across entities but do not change over time; n different intercepts; 1 slope for all entities
time fixed variables
an omitted variable might vary over time but not across entities; intercepts change over time
entity and time fixed effects together
use entity demeaning and (T - 1) dummies
clustered standard errors
needed because observations for the same entity are not independent because it’s the same entity; allows for errors to be correlated within clusters (entities)
linear probability model (LPM)
predicted value is a probability; coefficients is the difference in probability
probit model
models probability that Y=1 using the cumulative standard normal distribution function
logit model
models probability of Y=1 using the cumulative standard logistic distribution function
maximum likelihood estimator (MLE)
value of B0 and B1 that maximize the likelihood function
measures of fit for binary dependent variables
(1) fraction correctly predicted (2) pseudo-R2
pseudo-R2
measures fit using likelihood function; measures improvement in value of log likelihood, relative to having no X’s
instrumental variables solve
(1) OVB (2) simultaneously causality bias (3) error-in-variables (4) sample selection bias
identification (for IV)
a parameter is said to me identified if different values of the parameter would produce different distributions of the data; depends on number of instruments (m) and number of endogenous regressors (k)
overidentified
m > k
underidentified
m < k
how to test for relevance of IV
F-test; weak if first stage coefficients are zero or nearly zero; weak if first stage F-statistic < 10
how to test for exogeneity of IV
J-test
J-test
(1) estimate equation of interest using TSLS and all m instruments; compute predicted values Y using X to estimate the second stage (2) compute residuals (3) regress u against Z,W (4) compute F-statistic testing hypothesis that Z are all zero 5) J-statistic is J = mF
threats to internal validity for experiments
failure to randomize, partial compliance, attrition, experimental effects
threats to external validity for experiments
nonrepresentative sample, nonrepresentative treatment, general equilibrium effects
time series data
data collected on the same observational unit at multiple time periods
use time series data for
(1) forecasting models (2) estimate dynamic causal effects
AR(p) model
uses p lags of Y as regressors; use t or F-tests to determine lag order p
ADL(p,r) model
use when there are other variables that might be useful predictors of Y; p = lags of Y, r = lags of X
HAC standard errors
use HAC because u is serially correlated; robust to both heteroskedasticity and autocorrelation
how to identify number of lags for dynamic causal effects regression
truncation parameter: m = 0.75(T)^1/3
LSA #1 for panel data
E(u | X, a) = 0; no omitted lagged effects; there is not feedback from u to any future X
LSA #2 for panel data
(X,u) are iid draws from their joint distribution; satisfied if entities are randomly sampled from their population by simple random sampling; does not require observations to be iid over time for the same entity
LSA #3 for panel data
(X,u) have finite fourth moments
LSA #4 for panel data
there is no perfect multicollinearity