52d ago
ED

Econometrics

1. Foundations of Econometrics

What is Econometrics?

  • Econometrics applies statistical methods to economic data to test hypotheses and estimate causal relationships.

  • Key challenge: Distinguishing correlation from causation.

Regression Analysis as a Tool

  • Regression allows us to model relationships:

    Yi=β0+β1Xi+εiY_i = \beta_0 + \beta_1 X_i + \varepsilon_iYi​=β0​+β1​Xi​+εi​

    • YiY_iYi​: Dependent variable (outcome).

    • XiX_iXi​: Independent variable (predictor).

    • εi\varepsilon_iεi​: Error term (captures unobserved factors).

  • Key Question: Does XXX cause YYY? Or is the relationship spurious due to omitted variables, reverse causality, or measurement error?


2. Probability and Statistical Foundations

Understanding Random Variables

  • Discrete vs. Continuous:

    • Discrete: Limited set of outcomes (e.g., number of students in a class).

    • Continuous: Infinite possible values (e.g., income levels).

  • Probability Distributions:

    • PDF (Probability Density Function): Shows the likelihood of different outcomes.

    • CDF (Cumulative Distribution Function): Shows the probability of observing a value ≤ a given point.

  • Expectation & Variance:

    • Expected Value (Mean): E[Y]=∑PiYiE[Y] = \sum P_i Y_iE[Y]=∑Pi​Yi​

    • Variance: Measures spread of distribution.

    • Standard Deviation: Square root of variance.

  • Covariance & Correlation:

    • Covariance: Measures how two variables move together.

    • Correlation: Standardized covariance (bounded between -1 and 1).


3. Types of Data

  • Cross-sectional: Many units at a single time.

  • Time series: Single unit observed over multiple time periods.

  • Panel data: Combines both (e.g., state-level unemployment rates over 10 years).


4. Ordinary Least Squares (OLS) and Assumptions

OLS estimates parameters by minimizing the sum of squared residuals.

Key Assumptions (Gauss-Markov)

  1. Linearity: Model correctly specifies the relationship.

  2. Random Sampling: Observations are independent.

  3. No Perfect Multicollinearity: No exact linear relationships among predictors.

  4. Zero Conditional Mean of Errors: E[ε∣X]=0E[\varepsilon | X] = 0E[ε∣X]=0 (no omitted variable bias).

  5. Homoskedasticity: Error variance is constant across values of XXX.

  6. Normality of Errors (for inference): ε∼N(0,σ2)\varepsilon \sim N(0, \sigma^2)ε∼N(0,σ2).


5. Threats to Internal Validity

Internal validity refers to whether a study correctly identifies a causal effect. Threats arise when the estimated relationship between XXX and YYY is biased.

1. Omitted Variable Bias (OVB)

Occurs when a variable that affects both XXX and YYY is left out of the model.

How to Spot It:
  • Ask: Is there a missing factor that could be driving both XXX and YYY?

  • If the omitted variable is correlated with XXX, OLS estimates are biased.

  • Example:

    • Regression: Income=β0+β1Education+ε\text{Income} = \beta_0 + \beta_1 \text{Education} + \varepsilonIncome=β0​+β1​Education+ε

    • Omitted Variable: Ability

    • If ability increases both education and income, the effect of education is overstated.

How to Address It:
  • Include the omitted variable (if measurable).

  • Use fixed effects to control for unobservable factors.

  • Use an Instrumental Variable (IV).


2. Reverse Causality

Occurs when YYY actually causes XXX instead of the other way around.

How to Spot It:
  • Ask: Could the dependent variable be influencing the independent variable?

  • Example:

    • Regression: Crime Rate=β0+β1Police Presence+ε\text{Crime Rate} = \beta_0 + \beta_1 \text{Police Presence} + \varepsilonCrime Rate=β0​+β1​Police Presence+ε

    • Reverse Causality: High crime rates cause an increase in police presence.

How to Address It:
  • Lagged variables: Use past values of XXX to predict current YYY.

  • Instrumental Variables (IV).


3. Measurement Error

Occurs when the independent variable XXX is measured with error.

How to Spot It:
  • Ask: Is XXX reported or measured inaccurately?

  • Example:

    • If people underreport their income in surveys, bias may result.

Types of Measurement Error:
  • Classical Measurement Error (random error): Reduces precision, but does not bias estimates.

  • Non-classical Measurement Error (systematic error): Biases estimates.

How to Address It:
  • Use instrumental variables or better data sources.


4. Misspecified Functional Form

Occurs when a model assumes a linear relationship when the true relationship is nonlinear.

How to Spot It:
  • Check scatter plots: Do relationships appear non-linear?

  • Example:

    • Quadratic relationships: Income and happiness might have a diminishing return.

How to Address It:
  • Add polynomial terms (e.g., X2X^2X2).

  • Use log transformations.


5. Outliers and Leverage Points

Extreme values can distort estimates.

How to Spot It:
  • Look at histograms or scatter plots.

  • Example:

    • A single billionaire in an income regression may distort results.

How to Address It:
  • Winsorizing (replace extreme values with threshold values).

  • Robust regression methods.


6. Sample Selection Bias

Occurs when the sample is not representative of the population.

How to Spot It:
  • Ask: Does the sample systematically exclude certain groups?

  • Example:

    • Studying only employed people when analyzing income ignores those who can’t work.

How to Address It:
  • Use Heckman selection models.


6. Methods to Address Internal Validity Issues

1. Randomized Control Trials (RCTs)

  • Gold standard for causal inference.

  • Randomly assigns treatment and control.

2. Difference-in-Differences (DiD)

  • Compares treatment & control groups before and after a policy change.

  • Key Assumption: Parallel Trends (control group is a good counterfactual).

3. Fixed Effects (FE)

  • Controls for unobserved characteristics that do not change over time.

  • Used in panel data (e.g., state-by-year analysis).

4. Instrumental Variables (IV)

  • Used when XXX is endogenous (correlated with the error term).

  • Example: Using distance to school as an instrument for education.

5. Regression Discontinuity (RD)

  • Uses a cutoff rule (e.g., students above a certain GPA get scholarships)


knowt logo

Econometrics

1. Foundations of Econometrics

What is Econometrics?

  • Econometrics applies statistical methods to economic data to test hypotheses and estimate causal relationships.

  • Key challenge: Distinguishing correlation from causation.

Regression Analysis as a Tool

  • Regression allows us to model relationships:

    Yi=β0+β1Xi+εiY_i = \beta_0 + \beta_1 X_i + \varepsilon_iYi​=β0​+β1​Xi​+εi​

    • YiY_iYi​: Dependent variable (outcome).

    • XiX_iXi​: Independent variable (predictor).

    • εi\varepsilon_iεi​: Error term (captures unobserved factors).

  • Key Question: Does XXX cause YYY? Or is the relationship spurious due to omitted variables, reverse causality, or measurement error?


2. Probability and Statistical Foundations

Understanding Random Variables

  • Discrete vs. Continuous:

    • Discrete: Limited set of outcomes (e.g., number of students in a class).

    • Continuous: Infinite possible values (e.g., income levels).

  • Probability Distributions:

    • PDF (Probability Density Function): Shows the likelihood of different outcomes.

    • CDF (Cumulative Distribution Function): Shows the probability of observing a value ≤ a given point.

  • Expectation & Variance:

    • Expected Value (Mean): E[Y]=∑PiYiE[Y] = \sum P_i Y_iE[Y]=∑Pi​Yi​

    • Variance: Measures spread of distribution.

    • Standard Deviation: Square root of variance.

  • Covariance & Correlation:

    • Covariance: Measures how two variables move together.

    • Correlation: Standardized covariance (bounded between -1 and 1).


3. Types of Data

  • Cross-sectional: Many units at a single time.

  • Time series: Single unit observed over multiple time periods.

  • Panel data: Combines both (e.g., state-level unemployment rates over 10 years).


4. Ordinary Least Squares (OLS) and Assumptions

OLS estimates parameters by minimizing the sum of squared residuals.

Key Assumptions (Gauss-Markov)

  1. Linearity: Model correctly specifies the relationship.

  2. Random Sampling: Observations are independent.

  3. No Perfect Multicollinearity: No exact linear relationships among predictors.

  4. Zero Conditional Mean of Errors: E[ε∣X]=0E[\varepsilon | X] = 0E[ε∣X]=0 (no omitted variable bias).

  5. Homoskedasticity: Error variance is constant across values of XXX.

  6. Normality of Errors (for inference): ε∼N(0,σ2)\varepsilon \sim N(0, \sigma^2)ε∼N(0,σ2).


5. Threats to Internal Validity

Internal validity refers to whether a study correctly identifies a causal effect. Threats arise when the estimated relationship between XXX and YYY is biased.

1. Omitted Variable Bias (OVB)

Occurs when a variable that affects both XXX and YYY is left out of the model.

How to Spot It:
  • Ask: Is there a missing factor that could be driving both XXX and YYY?

  • If the omitted variable is correlated with XXX, OLS estimates are biased.

  • Example:

    • Regression: Income=β0+β1Education+ε\text{Income} = \beta_0 + \beta_1 \text{Education} + \varepsilonIncome=β0​+β1​Education+ε

    • Omitted Variable: Ability

    • If ability increases both education and income, the effect of education is overstated.

How to Address It:
  • Include the omitted variable (if measurable).

  • Use fixed effects to control for unobservable factors.

  • Use an Instrumental Variable (IV).


2. Reverse Causality

Occurs when YYY actually causes XXX instead of the other way around.

How to Spot It:
  • Ask: Could the dependent variable be influencing the independent variable?

  • Example:

    • Regression: Crime Rate=β0+β1Police Presence+ε\text{Crime Rate} = \beta_0 + \beta_1 \text{Police Presence} + \varepsilonCrime Rate=β0​+β1​Police Presence+ε

    • Reverse Causality: High crime rates cause an increase in police presence.

How to Address It:
  • Lagged variables: Use past values of XXX to predict current YYY.

  • Instrumental Variables (IV).


3. Measurement Error

Occurs when the independent variable XXX is measured with error.

How to Spot It:
  • Ask: Is XXX reported or measured inaccurately?

  • Example:

    • If people underreport their income in surveys, bias may result.

Types of Measurement Error:
  • Classical Measurement Error (random error): Reduces precision, but does not bias estimates.

  • Non-classical Measurement Error (systematic error): Biases estimates.

How to Address It:
  • Use instrumental variables or better data sources.


4. Misspecified Functional Form

Occurs when a model assumes a linear relationship when the true relationship is nonlinear.

How to Spot It:
  • Check scatter plots: Do relationships appear non-linear?

  • Example:

    • Quadratic relationships: Income and happiness might have a diminishing return.

How to Address It:
  • Add polynomial terms (e.g., X2X^2X2).

  • Use log transformations.


5. Outliers and Leverage Points

Extreme values can distort estimates.

How to Spot It:
  • Look at histograms or scatter plots.

  • Example:

    • A single billionaire in an income regression may distort results.

How to Address It:
  • Winsorizing (replace extreme values with threshold values).

  • Robust regression methods.


6. Sample Selection Bias

Occurs when the sample is not representative of the population.

How to Spot It:
  • Ask: Does the sample systematically exclude certain groups?

  • Example:

    • Studying only employed people when analyzing income ignores those who can’t work.

How to Address It:
  • Use Heckman selection models.


6. Methods to Address Internal Validity Issues

1. Randomized Control Trials (RCTs)

  • Gold standard for causal inference.

  • Randomly assigns treatment and control.

2. Difference-in-Differences (DiD)

  • Compares treatment & control groups before and after a policy change.

  • Key Assumption: Parallel Trends (control group is a good counterfactual).

3. Fixed Effects (FE)

  • Controls for unobserved characteristics that do not change over time.

  • Used in panel data (e.g., state-by-year analysis).

4. Instrumental Variables (IV)

  • Used when XXX is endogenous (correlated with the error term).

  • Example: Using distance to school as an instrument for education.

5. Regression Discontinuity (RD)

  • Uses a cutoff rule (e.g., students above a certain GPA get scholarships)