1/30
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Econometrics
uses statistics and data to answer questions in economics, focusing on causal impacts rather than just correlations.
Cross-sectional data
involves information about many subjects at a single point in time, like a survey
Panel data
(longitudinal data) tracks many subjects across multiple time periods
Time series data
focuses on one subject over many different times
Causality
Economists seek to determine causal effects, understanding that correlation does not equal causation. Econometrics is used to address causality since true experiments are rare
Simple Linear Regression Model
The model is represented as Y = B0 + B1X + U, where Y is the dependent variable, X is the independent variable, and U represents all other factors affecting Y.
B0 is the intercept parameter, and B1 is the slope parameter.
U encompasses unobserved variables or errors
Zero Conditional Mean (ZCM) Assumption
E(U|X) = 0
The ZCM assumption states that the average of U, conditional on any value of X, is equal to zero.
It's crucial for making statements about causal effects.
If the ZCM assumption doesn't hold, you can't say that X causes Y
Population Regression Function
E(Y|X) represents the average of Y for a given value of X and equals B0 + B1X
Nonexperimental
not accumulated through controlled experiments
Experimental data
often collected in laboratory settings
Ceteris Paribus
"all other things being equal" used in economic analysis to isolate the effect of one variable while holding others constant.
Estimating Parameters
Use data to estimate B0 and B1 in the model Y = B0 + B1X + u.
The residual or prediction error is ûi = Yi - Ŷi = Yi - B0 - B1Xi.
Ordinary Least Squares (OLS) estimates are found by Sum of Squares (SST)
Total Sum of Squares (SST)
represents the total variation in the dependent variable (Y)
Explained Sum of Squares (SSE)
the variation explained by the independent variables. Removing explanatory variables from a regression always decreases (or possibly keeps exactly the same) the R-squared.
Residual Sum of Squares (SSR)
unexplained variation (error)
R-squared
R2 = SSE/SST = 1 - SSR/SST
measure of how well a regression model explains the variation in the dependent variable (Y).
Higher R2 values indicate a better fit, but a very high R2 might suggest overfitting.
R2 is between 0 and 1
Units of measurement do not affect R-squared! (even with “negative units of measurement”. (EX. The variable x and the variable -x have the same information!)
Statistical Properties of Estimators (OLS)
OLS estimates of B0 and B1 are denoted as ˆβ0 and ˆβ1.
Unbiasedness means that on average, the estimates are correct: E(ˆβ0) = B0 and E(ˆβ1) = B1
Assumptions for Unbiasedness of OLS
SLR.1: Y = B0 + B1X + u.
SLR.2: Random sample of data on X and Y (cross-sectional data).
SLR.3: There is variation in the value of the x variable.
SLR.4: E(u|x) = 0
Variance of OLS
The variance of ˆβ1 is Var(ˆβ1) = E((ˆβ1 - E(ˆβ1))2)
Homoskedasticity (MLR.5)
means that u has the same variance for all values of x
Multiple Linear Regression (MLR)
The general form is Y = B0 + B1X1 + B2X2 + ... + BKXK + u.
The ZCM assumption in MLR is E(u|X1, X2, ..., XK) = 0
Omitted Variable Bias (OVB)
When a regression model leaves out a relevant explanatory variable that is correlated with both the dependent variable and at least one included independent variable. This omission leads to biased and inconsistent estimates of the coefficients
OVB is E(ˆβ1) - B1 = B2 δ1 where δ1 comes from the regression of X2 on X1
Conditions for OVB
For an omitted variable to cause bias, it must:
Affect the dependent variable – The omitted variable must have a real impact on the outcome (Y) (ex. Positive Correlation)
Be correlated with at least one included independent variable – If the omitted variable is unrelated to the included regressors, its absence won’t distort their estimated effects.
The Variable is negatively correlated with X and negatively correlated with Y → No bias.
The variable is positively correlated with X and positively correlated with Y → No bias.
There is only bias if the omitted variable creates a distortion in the estimated coefficient of X. This only happens with the variable is correlated with X and Y in opposite directions.
intercept coefficient
represents the expected value of the dependent variable (Y) when all independent variables are equal to zero.
Linear Model (No Logging)
B1: Represents the absolute change in Y for one-unit change in X.
Example: If B1 = 2, then increasing X by 1 unit increases Y by 2 units.
Log-Linear Model (Log Y, No Log X)
B1: represents the percentage change in Y for a one-unit change in X
Example: If B1 = 0.05, then increasing X by 1 unit increases Y by 5%.
Use case: When Y grows exponentially (income, population, sales)
Linear-Log Model (No Log Y, Log X)
B1: Represents the absolute change in Y for a 1% increase in X.
Example: If B1 = 3, then increasing X by 1% increases Y by 0.03 units.
Use case: When X has diminishing returns (additional years of education on salary)
Log-Log Model (Both Y and X Logged)
B1: Represents the elasticity - the percentage change in Y for a 1% change in X.
Example: if B1 = 0.8, then a 1% increase in X leads to a 0.8% increase in Y.
Use case: When both X and Y follow exponential growth patterns (price vs. demand, GDP vs. exports)
Why use logs?
Reduces Skewness: makes data more normally distributed.
Handles Non-Linearity: Transforms exponential relationships into linear ones.
Interpretable Results: Helps intercept effects in percentage terms.
Sample Covariance (in explanatory variables)
The sample covariance between any explanatory variable and the residuals is zero, whether or not MLR.4 is true! This is a fact about OLS!
Duplication of data
This violates MLR.2 random sampling! One part of having a random sample is that each row in the data is independent from all the other rows. This is violated with there are duplicate rows!!! (NOT A RANDOM SAMPLE ANYMORE!)