function, coefficients
If we say E(y|x)=β0+β1x, where β0 and β1 solve the population least-squares problem, then the CEF is the population regression _____ and β0 and β1 are population regression _____.
linear approximation
The population regression function provides the best ______ to the CEF.
change, associated, unit change
given: yi=β0+β1xi1+ui,i=1,...,N.
The coefficient β1 measures the _____ in y _____ with a _____ in x1, holding all of the unobservables constant.
minimize, squared
given: yi=β0+β1xi1+ui,i=1,...,N.
If β0 and β1 solve the population least-squares problem their values ______ the expected value of the _____ difference between the dependent variable and the CEF.
covariance(xi, yi) / variance(xi)
given: yi=β0+β1xi1+ui,i=1,...,N.
The value of β1 that solves the population least-squares problem is:
sample averages, population averages, sample average
given: yi=β0+β1xi1+ui,i=1,...,N.
The OLS estimator for β1 can be obtained by plugging in the _____ of xi and yi for their ______ and plugging in another _____ for each outer expectation.
same, residuals
given: yi=β0+β1xi1+ui,i=1,...,N.
If there were more than one x in (1), then the formula for β1 would be the _____, except xi1 would be replaced with the _____ from a regression of xi1 on the other xs.
FWL, residuals
given: yi=β0+β1xi1+ui,i=1,...,N.
The ______ theorem says you can control for other explanatory variables in estimating the effect of an x on y by either including the other variables directly or regressing y on the ______ from a regression of x on the other variables.
partial
given: yi=β0+β1xi1+ui,i=1,...,N.
When the PRF includes more than one x, we say that β1 measures the _____ effect of x1 (without necessary giving a causal interpretation).
mean independent, zero, unbiased
given: yi=β0+β1xi1+ui,i=1,...,N.
If E(ui|xi1)=0 in (1), xi1 is _____ of ui and the sampling error of β̂ 1 equals ______ on average, which implies that β̂ 1 is ______.
consistent
given: yi=β0+β1xi1+ui,i=1,...,N.
If E(ui|xi1)=0 in (1), the sampling error of β̂ 1 converges to 0 and β̂ 1 is ______.
upward, sign
given: yi=β0+β1xi1+β2xi2+ui,,i=1,...,N,
If you omit xi2 from (2), β̂ 1 will be biased ______ if β2 and cov(xi1,xi2) have the same ______.
downward, positive, negatively
given: yi=β0+β1xi1+β2xi2+ui,,i=1,...,N,
If yi1 is log wage, xi1 is education and xi2 is labor market experience, and you omit xi2 from (2), then β̂ 1 will be biased ______ because β2 is _____ and cov(xi1,xi2) are _____ correlated.
biased down
given: yi=β0+β1xi1+β2xi2+ui,,i=1,...,N,
Let's say you don't omit xi2, but it is measured with error. Then β̂ 2 will be ______ . (unbiased/ biased down/ biased up)
dependent, explanatory
given: yi=β0+β1xi1+β2xi2+ui,,i=1,...,N,
R2 measures how much of the variance of the ______ variable is accounted for by the ______ variables.
false
given: yi=β0+β1xi1+β2xi2+ui,,i=1,...,N,
True or false: R2 is centrally important for doing causal inference.
CLT, sampling distribution, normal
Basic OLS inference is grounded in the application of the _____, which says that the ______ of the OLS estimator can be regarded as approximately _____ for large samples.
explanatory
The modern approach to regression inference allows for the variance of the errors depends on the ______ variables.
robust
The modern approach means we should always report _____ standard errors and test statistics.
heteroscedasticity
The R function lm gives the wrong standard errors, test statistics and confidence intervals because it ignores ______
estimated coefficient, standard error
The test statistic for whether a explanatory variable has a statistically significant association with the dependent variable is the ratio of the explanatory variable's ______ to its _____.
(Beta-hat2 - 1)/se(Beta-hat2)
given: yi=β0+β1xi1+β2xi2+ui,,i=1,...,N,
the test statistic for the null hypothesis that β2=1 is ________.
t, p, against
Larger ______ statistics and smaller ______ values indicate stronger evidence ______ (for/against) the null hypothesis.
F, omits, includes
Suppose yi=β0+β1xi1+β2xi2+β3xi3+β4xi4+ui. To test the null that β3=β4=0, you use an ______ test, which compares the fit of a short regression that ______ x3 and x4 with the fit of a long regression that ______ them.
false
True or false. If corr(x,y)=0, y does not depend on x.
true
True or false. If x causes y, the conditional distribution of y given x must depend on x.
counterfactual, missing
You can't observe the effect of a treatment on an individual because you can't observe their ______ outcome. In this sense, causal inference is fundamentally a ______ data problem.
potential
While individual treatment effects are not observable, you may be able to identify the average treatment effect (ATE), which is the difference in average ______ outcomes
selection bias
Using the difference in sample average outcomes for treated and untreated individuals generally won't work for estimating the ATE because potential outcomes are not independent of treatment assignment, which results in what kind of bias?
the average treatment effect on the treated
given:
E(yi|Di=1)−E(yi|Di=0)=E(y1i|Di=1)−E(y0i|Di=1) (T1)
+E(y0i|Di=1)−E(y0i|Di=0) (T2)(1)
term 1 is
selection bias
given:
E(yi|Di=1)−E(yi|Di=0)=E(y1i|Di=1)−E(y0i|Di=1) (T1)
+E(y0i|Di=1)−E(y0i|Di=0) (T2)(1)
term 2 is
zero, ATE (average treatment effect)
given:
E(yi|Di=1)−E(yi|Di=0)=E(y1i|Di=1)−E(y0i|Di=1) (T1)
+E(y0i|Di=1)−E(y0i|Di=0) (T2)(1)
If treatment assignment is randomized, then TERM 2 equals ______ and TERM 1 equals the ______.
independent, ignorable
If the potential outcomes are ______ of treatment assignment, the assignment mechanism is ______ and the difference in sample average outcomes for treated and untreated individuals will identify the ATE.
independent, randomly
Potential outcomes will be ______ of treatment assignment if individuals are ______ assigned to treated and untreated groups.
independent, confoundedness
The conditional independence assumption (CIA) is a claim that there is a set of covariates that once you control for them, you can consider the potential outcomes to be ______ of treatment assignment. The CIA is a claim of un______ and is untestable.
treated, untreated
To estimate the ATE under a CIA, you also need overlap, which is the ability to observe ______ and ______ units for any set of covariate values.
residual
If you have a set of control variables for which a CIA holds, you can identify the average effect of the treatment on the outcome by running a regression of the outcome on the _____ from a regression of the treatment dummy on the controls.
overlap
Unlike in standard regression analysis, in RD designs there is no ______ in treated and control units because individuals with different values of D, the treatment, have different values of the covariate by construction.
independence, running
In a sharp RD design, the conditional _________ assumption holds automatically because treatment assignment is determined solely by the cutoff value of the _______ variable.
probability
In a fuzzy RD design, the cutoff value of the running variable determines the _____ of treatment.
potential, continuous
The key identifying assumption of an RD design is that the average _____ outcomes are _____ through the cutoff.
average treatment effect measured at the cutoff
Under the assumptions of a sharp RD design, you identify an
potential
given: stylized sharp RD design
The black lines are linear regression approximations to the CEFs for the ______ outcomes.
yi=β0+β1xi+τDi+ui.
given: stylized sharp RD design
Select the regression specification that is consistent with the black lines.
τ=E(y1i−y0i|xi=c)
given: stylized sharp RD design
Under the key identifying assumption of a sharp RD design, the model identifies
scatter, running
The basis for an RD analysis should be apparent in a binned ______ plot of the outcome and _____ variable.
polynomial, treatment
In general, the RD specification should include a low-order _____ in the running variable and an interaction of the running variable with the _____ indicator.
no evidence of manipulation because it is smooth through the cutoff.
The distribution of the running variable should show
covariates, discontinuities
An RD analysis of baseline _____ should show no evidence of _____ among them.
covariates, should not
Including the baseline _____ in the regression model (should/should not) _____ affect the estimated treatment effect.
treated, after, before, after
The standard 2×2 DD analysis compares the difference in average outcomes for the _____ observations before and _____ treatment with the difference in mean outcomes for the control observations _____ and _____ treatment.
treated
A DD analysis targets the average treatment effect on the _____ .
unobservable
The target estimand cannot be estimated directly because E(y0|g=1,t=1) is _____.
parallel, absence
The key identifying assumption in a DD analysis is that the treated and untreated outcomes would follow _____ trends in the _____ of the treatment.
trends
A simple before vs after comparison of treated observations misses the _____ in the outcome not associated with treatment.
selection
A simple comparison of treated vs control observations after treatment misses factors that cause non-random _____ into treatment.
treated, untreated
given: table below maps the DD estimands to parameters that capture differences between groups and changes over time
The parameter γ reflects the average difference between _____ and _____ outcomes before treatment.
given: table below maps the DD estimands to parameters that capture differences between groups and changes over time
The parameter η also reflects the _____ average difference in outcomes between periods 0 and 1 for the _____ group.
given: table below maps the DD estimands to parameters that capture differences between groups and changes over time
The parameter η reflects the average difference in outcomes _____ and _____ treatment for the untreated group.
parallel trends
given: table below maps the DD estimands to parameters that capture differences between groups and changes over time
If η varied by group, the _____ assumption would not hold.
ATT (after treatment effect of treated)
given: table below maps the DD estimands to parameters that capture differences between groups and changes over time
The parameter δ represents the _____.
group, dummy, interaction
The standard 2×2 DD analysis can be carried out by regressing the outcome on a _____ dummy, a period _____, and their _____.
y=μ+γtreat+ηafter+δtreat⋅after+u
A formal expression of the DD regression consistent with Table 2 is:
facilitates standard error estimation
generalizes for multiple time periods and treatment groups
accommodates covariates
A regression formulation of a DD design is appealing because it
group
We described a TWFE (two way fixed effects) model as a regression model for data with both a _____ and time dimension.
clustering, heteroscedasiticity, serial
Computing the correct standard errors for TWFE estimates usually requires _____ at the group level to account for _____ and _____ correlation.
clustering, heteroscedasiticity, serial