1/63
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
Econometrics
statistical methods used to estimate/test economics relationships
Causal Effect
D on Y is Yi1 - Yi0
Problem of Causal Inference
can never observe opposite treatment for same individuals
Counterfactual
outcome for treatment you didnât take. Canât know but can estimate
Random variables
numerical summary of a random outcome
Probability distribution
prob of any event occurring. pdf(continous)
Cumulative distribution
probability of random variable<= some value (discrete). cdf: continous
Expected value
E(Y) = uY (mean)
RCTs
Randomized control trials
Law of Large Numbers
Larger n is, closer we get to true population mean with little variance
iid
independently and identically distributed
ind = one doesnât affect other
ident = from same probability distribution
CLT
Central Limit Theorem
Data sets where n>=30 will be normally distributed, regardless of distribution for original set
Joint Probability
Prob that A and B happen
P(X=x, Y=y)
Conditional Probability
prob of Y happening, given X
P(Y=y|X=x)
formula on sheet
Bernoulli
random variable but with two possible outcomes
Estimator
Sample term
random variable
Formula to find estimate
Estimand
What we want to find, qualitative
Ex.: height of all students
Estimae
numerical value from estimator
Rules for a good estimator
Consistent
Unbiased
Efficient
Consistent
n is large. P that estimator is within small interval of mean is high
Unbiased
Expected value of estimator is the true value of the parameter
Efficient
Lowest variance
sample variance/deviation
spread of values of Y in our sample. dispersion
standard error
standard deviation of the sample mean
OLS
Ordinary Least Squares
Linear Regression Assumptions for Causal interpretation
E(ei|Xi) = 0 â Other determinants of Y outside of X are uncorrelated with X (violation is ommitted variable bias)
Observations are iid
Large outliars unlikely
4th assumption
Errors are homoskedastic
If the case, then BLUE
R and R²
R â correlation between x and y is positive/negative
R² â z% of the variability in y is explained by x
Hetero/homoskedastic and SE
SE for homo only valid if homo. SE for hetero-robust SE is always valid
Binary Ind Variable for Regression
Beta is average difference between Y=0 and Y=1
alpha is sample mean for 0
OVB Condition
X corr with omitted variable
Omitted variable is a determinant of Y
Biased Term
B1 + B2cov(X1X2)/var(X1)
+ = upward bias
- = downward bias
Multivariate regression interpretation
a 1 unit change oi X1 is associated with a B1 change in Y, holding all other variables constant (must list them out)
OLS Assumptions for Multivariate
E(ei|X1,X2,X3,âŚXki)=0
Yi,X1, etc. are iid
Large outliers unlikely
No Perfect multicollinearity
Perfect multicollinearity
one of independent variables is a perfect linear function of other independent variables
for example: B3fracfemale + B4percfemale
fraction and percentage will be in each others formulasâŚ
Dummy Variable Trap
multicollinearity condition applied to a specific set of outcome, liklihood of all add up to one
job happiness = a + Btransportation + ei
walk =1, bike = 2, car =3, train=4, bus=5
DVT is I make all 5 an individual regressor
Instead, n-1 of dummy varibales in regression, other one omitted will be base
Hypothesis testing for Multivariate
Can do the same way if testing one of them, same formula
For more than one, need to do Joint Hypothesis Test
Joint Hypothesis Test
H0: B1 = something and B2 = something and âŚ.
Ha: one or more of the q restrictions do not hold
But, compute F-stat instead of T. At degrees of freedom and confidence level, is F-stat more extreme than given?
Adjusted R²
no matter what, R² will go up when you add another regressor, there will always be some sort of relationship calculated.
So, the adjusted version has a penalty for every additional regressor used
Quadratic Regression
sometimes a regression isnât linear, so we can use a parabola to describe
Can check if linear by testing squared B against null that itâs 0
Quadratic interpretation
Y increasing at a decreasing rate. A 1 unit increase at mean X would cause XYZ change in Y
Linear Log Interpretation
1% increase in X is associated with 0.01B change in Y
Log Linear Interpretation
1 unit increase in X is associated with a 100B% change in Y
Log Log Interpretation
1 % increase in X associated with a B% change in Y
How can we compare Log regressions?
Can use R² to interpret log linear and log log since they are predicted the same log(Y).
Canât compare to linear-log since itâs against just Y.
Between both those categories, just have to logic through what makes the most sense in terms of the intepretation
interaction term
when b1 and b2 have a relationship between each other that could affect their value, you account for that with the interaction term
B3(X1*X2)
Interaction term interpretation
Figure it out bru :(
Elasticity or holding something constant formula â non-linear model
constant goes on interaction term, other terms coeff (B) is added
Internal Validity
statistical inferences about causal effects are valid for the population being studied.Â
estimates are unbiasedÂ
SEs are correct
Threats:
Violation of first OLS assumption â E[e,Xi] = 0
sample selection bias
simultaneous causality (if X causes Y and Y causes X)
Can mitigate OVB with a randomized experiment
External Validity
Inferences and conclusions can be generalized to outside populations
Binary Dependent Variables
âDid you vote?â
âAre you employedâ
Linear Model for Binary Indep
Y = a + B1X1 + B2X2 + ⌠+ ei
B is the change in percentage points that Y=1 associated with a 1 unit change in X
Problem is that it doesnât work well for very large numbers
Probit Model for Binary Indep
cumulative dist of standard normal func
P(Y=1|X) = phi(a+BX)
P(college=1|famincome) = phi(a+Bfamincome)
B is the change in z-score
IF B>0, a 1 unit increase in X is associated with a increased probability that Y=1 (and vice versa)
marginal change would be percentage point
Types of Data Sets
Cross-sectional â 1 time period, n entities
time series â 1 entity, t time periods
Panel (longitudinal) â data on same n entities over t time periods
Entity fixed effects
Controlling for things that change across entities but donât change over time
Create dummy for each n-1 entities or just say oi.
If oi, REMOVE INTERCEPT (it includes all dummies)
1 â B1: a 1 pp increase in UE is associated with a B1 increase in crime rate, controlling for state FE. B2: the difference in average crime rate between state 2 and state 1 during time period, controlling for UEÂ
2 â 1 pp increase in UE is associated with B1 increase in crime rate, controlling for state FE (o2 is the average crime rate in state 2 during the time period, controlling for UE)Â
Time Fixed Effects
Change over time but not entities
yit
Same as Entity for incorporation
Assumptions for Entity FE
E(eit|Xi1,Xi1,âŚ, ai) = 0
n are iid
large outliers unlikely
no perfect multicolinearity
Autocorrelation
often a problem with time-series data
Error terms often correlated across time within entity (high UE in state is highly correlated with a high one next year)
We use clustered SEs to fix this.
Endogeneity
independent variable is correlated with error term (violates first OLS assumptions)
Exogeneity
Not correlated
Instrumental Variables
Isolate the part of X thatâs exogenous from error term. Meaning find variable Z, that is relevant and exogenous
Corr(Z,X) not = 0 (Relevance, must be correlated with X)
Corr(Z,e) = 0 (Exogenous, only correlated with Y through X)
2 Stage Least Squares
Way of understanding instrumental variables:
Stage 1:
Part that is correlated with e and predicted by Z
Xhati = ohat + pihatZi
Stage 2: Run OLS with this new predicted X
Yi = a + BXhati + e
Difference in Differences
Difference over time within each group (treatment/control) and then the differences of those differences
          1849   1854   Delta Yi    DiD
SVÂ Â Â 135Â Â Â Â 147 Â Â Â Â 12
La    85      19     -66       -78
Yit = a + B1Di + B2Postt + B3(Di * Postt) + eit
a = sample average in the control group in the pre treatment period
B1 = differences in pre treatment averages
B2 = differences in control group for post and pre treatment periods
B3 = DID estimator. Causal effect. difference of differences in the within-group average over time between the treated and control groups
Parallel Trends Assumption
Had treated group not been treated, they wouldâve followed the trend of the control group