1/47
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
Econometrics
statistical methods used to estimate/test economics relationships
Causal Effect
D on Y is Yi1 - Yi0
Problem of Causal Inference
can never observe opposite treatment for same individuals
Counterfactual
outcome for treatment you didn’t take. Can’t know but can estimate
Random variables
numerical summary of a random outcome
Probability distribution
prob of any event occurring. pdf(continous)
Cumulative distribution
probability of random variable<= some value (discrete). cdf: continous
Expected value
E(Y) = uY (mean)
RCTs
Randomized control trials
Law of Large Numbers
Larger n is, closer we get to true population mean with little variance
iid
independently and identically distributed
ind = one doesn’t affect other
ident = from same probability distribution
CLT
Central Limit Theorem
Data sets where n>=30 will be normally distributed, regardless of distribution for original set
Joint Probability
Prob that A and B happen
P(X=x, Y=y)
Conditional Probability
prob of Y happening, given X
P(Y=y|X=x)
formula on sheet
Bernoulli
random variable but with two possible outcomes
Estimator
Sample term
random variable
Formula to find estimate
Estimand
What we want to find, qualitative
Ex.: height of all students
Estimae
numerical value from estimator
Rules for a good estimator
Consistent
Unbiased
Efficient
Consistent
n is large. P that estimator is within small interval of mean is high
Unbiased
Expected value of estimator is the true value of the parameter
Efficient
Lowest variance
sample variance/deviation
spread of values of Y in our sample. dispersion
standard error
standard deviation of the sample mean
OLS
Ordinary Least Squares
Linear Regression Assumptions for Causal interpretation
E(ei|Xi) = 0 → Other determinants of Y outside of X are uncorrelated with X (violation is ommitted variable bias)
Observations are iid
Large outliars unlikely
4th assumption
Errors are homoskedastic
If the case, then BLUE
R and R²
R → correlation between x and y is positive/negative
R² → z% of the variability in y is explained by x
Hetero/homoskedastic and SE
SE for homo only valid if homo. SE for hetero-robust SE is always valid
Binary Ind Variable for Regression
Beta is average difference between Y=0 and Y=1
alpha is sample mean for 0
OVB Condition
X corr with omitted variable
Omitted variable is a determinant of Y
Biased Term
B1 + B2cov(X1X2)/var(X1)
+ = upward bias
- = downward bias
Multivariate regression interpretation
a 1 unit change oi X1 is associated with a B1 change in Y, holding all other variables constant (must list them out)
OLS Assumptions for Multivariate
E(ei|X1,X2,X3,…Xki)=0
Yi,X1, etc. are iid
Large outliers unlikely
No Perfect multicollinearity
Perfect multicollinearity
one of independent variables is a perfect linear function of other independent variables
for example: B3fracfemale + B4percfemale
fraction and percentage will be in each others formulas…
Dummy Variable Trap
multicollinearity condition applied to a specific set of outcome, liklihood of all add up to one
job happiness = a + Btransportation + ei
walk =1, bike = 2, car =3, train=4, bus=5
DVT is I make all 5 an individual regressor
Instead, n-1 of dummy varibales in regression, other one omitted will be base
Hypothesis testing for Multivariate
Can do the same way if testing one of them, same formula
For more than one, need to do Joint Hypothesis Test
Joint Hypothesis Test
H0: B1 = something and B2 = something and ….
Ha: one or more of the q restrictions do not hold
But, compute F-stat instead of T. At degrees of freedom and confidence level, is F-stat more extreme than given?
Adjusted R²
no matter what, R² will go up when you add another regressor, there will always be some sort of relationship calculated.
So, the adjusted version has a penalty for every additional regressor used
Quadratic Regression
sometimes a regression isn’t linear, so we can use a parabola to describe
Can check if linear by testing squared B against null that it’s 0
Quadratic interpretation
Y increasing at a decreasing rate. A 1 unit increase at mean X would cause XYZ change in Y
Linear Log Interpretation
1% increase in X is associated with 0.01B change in Y
Log Linear Interpretation
1 unit increase in X is associated with a 100B% change in Y
Log Log Interpretation
1 % increase in X associated with a B% change in Y
How can we compare Log regressions?
Can use R² to interpret log linear and log log since they are predicted the same log(Y).
Can’t compare to linear-log since it’s against just Y.
Between both those categories, just have to logic through what makes the most sense in terms of the intepretation
interaction term
when b1 and b2 have a relationship between each other that could affect their value, you account for that with the interaction term
B3(X1*X2)
Interaction term interpretation
Figure it out bru :(
Elasticity or holding something constant formula → non-linear model
constant goes on interaction term, other terms coeff (B) is added