Econometrics Key Concepts: Covariance, Correlation, and Regression Analysis

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/32

There's no tags or description

Looks like no tags are added yet.

Last updated 7:01 PM on 2/25/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

33 Terms

New cards

Covariance

Measures how two variables change together. Formula: Cov(X,Y) = Σ(xᵢ−x̄)(yᵢ−ȳ)/n. Positive → both move in same direction. Negative → they move opposite. Zero → no linear relationship.

New cards

Correlation

Standardized covariance bounded between [−1, 1]. Formula: ρ = Cov(X,Y)/(σ_X · σ_Y). ρ=1 is perfect positive, ρ=−1 is perfect negative. Measures strength and direction of a linear relationship. DOES NOT imply causation.

New cards

Variance

A measure of the spread of data points around the mean. Formula: σ² = Σ[xᵢ − E(X)]² · P(xᵢ). Always ≥ 0. Squaring prevents positive/negative deviations from canceling and penalizes larger deviations more heavily.

New cards

Central Limit Theorem

As sample size grows, the sampling distribution of the sample mean approaches a normal distribution regardless of the population's shape. Formula: X̄ ~ N(μ, σ²/n) as n → ∞. Justifies using normal-based inference in econometrics.

New cards

Law of Large Numbers

As sample size grows, the sample mean converges to the true population mean. Guarantees that larger samples produce more accurate estimates. Different from CLT: LLN is about convergence of the value; CLT is about the shape of the distribution.

New cards

Standard Deviation

Measures the spread of a variable X in its original units. Formula: σ = √Var(X). Describes how dispersed a distribution is around its mean.

New cards

Standard Error

Measures the spread of an estimator across repeated samples. Formula: SE(X̄) = σ/√n. As sample size increases, SE decreases. Used to build confidence intervals and t-statistics.

New cards

Selection Bias

The difference in baseline (untreated) outcomes between treatment and control groups. Formula: Selection Bias = E[Y(0)|X=1] − E[Y(0)|X=0]. Arises when treatment is not randomly assigned. Means the observed difference in means is NOT equal to the ATT.

New cards

Omitted Variable Bias

Bias that occurs when a relevant variable Z is left out of a regression. Two conditions required: (1) Z is correlated with X, AND (2) Z affects Y. Formula: β̂₁ = β₁ + β₂·Cov(X₁,X₂)/Var(X₁). Direction of bias = sign(β₂) × sign(Cov(X₁,X₂)).

New cards

P-value

A measure that helps determine the significance of results in hypothesis testing, indicating the probability of observing the data given that the null hypothesis is true. The probability of observing a t-statistic as extreme as the one computed, assuming H₀ is true. p < 0.05 → reject H₀ at 5% level. p < 0.01 → reject at 1% level. Does NOT measure the probability that H₀ is true.

New cards

Z-score

Standardizes any normal variable to the standard normal N(0,1). Formula: Z = (X − μ)/σ. Measures how many standard deviations an observation is from the mean. |Z| > 1.96 → in the extreme 5% of the distribution.

New cards

T-stat

A statistic used in hypothesis testing that measures the size of the difference relative to the variation in the sample data. Measures how many standard errors a coefficient is from the null hypothesis value. Formula: t = β̂ / SE(β̂). |t| > 1.96 → significant at 5% level. |t| > 2.58 → significant at 1% level. Rule of thumb: if |β̂/SE| ≥ 2, statistically significant.

New cards

Conditional Expectation

The expected value of Y given a specific value of X. Written as E[Y | X = x]. OLS estimates this as a linear function. The key OLS assumption is E[ε | X] = 0 (zero conditional mean). If violated, estimates are biased.

New cards

Functional Form

The mathematical specification of how X relates to Y. Level-Level: 1 unit ↑ in X → β change in Y. Log-Level: 1 unit ↑ in X → β×100% change in Y. Level-Log: 1% ↑ in X → β/100 unit change in Y. Log-Log: 1% ↑ in X → β% change in Y (elasticity).

New cards

Residual (Ui)

The difference between the actual value of Y and the predicted value from the regression. Formula: ûᵢ = Yᵢ − Ŷᵢ. OLS minimizes the sum of squared residuals. By construction, Σûᵢ = 0. The residual is the estimated version of the unobservable true error term.

New cards

Intercept

The expected value of Y when all independent variables equal zero. Formula: β₀ = E[Y | X = 0]. With a dummy variable, it represents the expected Y for the baseline group. Interpretation is not always meaningful depending on context.

New cards

Reverse Causality

Occurs when the true direction of causality is opposite to what's assumed — Y actually causes X rather than X causing Y. Violates E[ε|X] = 0, creating endogeneity and biasing OLS estimates. Example: Does police presence reduce crime, or does crime attract more police?

New cards

Multicollinearity

When two or more independent variables are highly correlated. OLS can still be computed (unless perfect) but standard errors are inflated, making it harder to achieve statistical significance. Does NOT cause bias — only reduces efficiency. Perfect multicollinearity makes OLS impossible to compute.

New cards

Heteroskedasticity

When the variance of the error term is not constant across values of X: Var(εᵢ|Xᵢ) = σᵢ² (varies). OLS coefficient estimates remain unbiased but standard errors are wrong, invalidating t-stats and hypothesis tests. Fix: use heteroskedasticity-robust standard errors.

New cards

Homoscedasticity

When the variance of the error term is constant across all values of X: Var(εᵢ|Xᵢ) = σ² (constant). One of the classical OLS assumptions. When satisfied, OLS is BLUE. In practice, always use robust standard errors to guard against violations.

New cards

Hypothesis Testing

A procedure to test claims about population parameters using sample data. Null hypothesis H₀ (e.g., β=0) vs. alternative H₁ (e.g., β≠0). We either "reject H₀" or "fail to reject H₀" — never "accept H₀." Rejecting H₀ at 5% means the result would occur less than 5% of the time if H₀ were true.

New cards

Normal/Gaussian Curve

The most important distribution in econometrics. Bell-shaped, symmetric around the mean, fully determined by μ and σ. Empirical rule: 68% of data within 1 SD, 95% within 2 SDs, 99.7% within 3 SDs.

New cards

Slope Coefficient

The expected change in Y associated with a one-unit increase in X. Formula: β̂₁ = Cov(X,Y)/Var(X). This algebraic identity always holds regardless of bias. With controls, interpretation becomes "holding other variables constant." Causal interpretation requires OLS assumptions to hold.

New cards

Randomization

Random assignment of units to treatment and control, making groups comparable on all characteristics. Eliminates selection bias because treatment is independent of potential outcomes: Xᵢ ⊥ (Yᵢ(1), Yᵢ(0)). Allows the simple difference in means to estimate the ATE. Gold standard for causal inference.

New cards

ATT (Average Treatment Effect on the Treated)

The expected causal effect of treatment, averaged only over those who were actually treated. Formula: ATT = E[Yᵢ(1) − Yᵢ(0) | Xᵢ = 1]. Observed difference in means = ATT + Selection Bias. ATT = ATE only under random assignment.

New cards

ATE (Average Treatment Effect)

The average effect of a treatment across the entire population, including both treated and untreated individuals. The expected causal effect of treatment averaged over the entire population. Formula: ATE = E[Yᵢ(1) − Yᵢ(0)]. Unobservable directly because we never see both potential outcomes for the same person. Equals ATT when treatment is randomly assigned.

New cards

R^2

The fraction of variation in Y explained by the regression model. Formula: R² = 1 − SSR/TSS. R²=0.75 means "75% of variation in Y is explained by the model." Increases with every variable added — even irrelevant ones. Does not prove causality or indicate model correctness.

New cards

Adjusted R^2

A modified R² that penalizes adding irrelevant variables. Formula: R̄² = 1 − [(n−1)/(n−k−1)]·SSR/TSS. Always ≤ R². Can be negative. Unlike R², can decrease when a new variable adds little explanatory power. Use this to compare models with different numbers of regressors.

New cards

Mis-specification Bias

A type of OVB where the wrong functional form is used (e.g., fitting a line when the true relationship is quadratic). Since X and X² are always correlated, omitting X² biases the coefficient on X. Formula: β̂₁ = β₁ + β₂·Cov(X₁,X₁²)/Var(X₁).

New cards

Measurement Error Bias

Bias from measuring an independent variable with error. With classical measurement error, X_observed = X_true + e, the result is attenuation bias — the coefficient is biased toward zero. If β₁ > 0, the estimate will be smaller than the true value. Non-classical measurement error can bias in any direction.

New cards

Reverse Causality Bias

New cards

OLS Assumptions

Conditions for OLS to produce Best Linear Unbiased Estimates (BLUE): (1) Linearity, (2) Independence of errors across observations, (3) Homoskedasticity (constant error variance), (4) No multicollinearity, (5) No endogeneity — E[ε|X] = 0. Violations lead to biased or inefficient estimates.

New cards

Confidence Interval

A range constructed from sample data. 95% CI: β̂ ± 2·SE(β̂). Correct interpretation: if we repeated sampling many times, 95% of constructed intervals would contain the true parameter. The true parameter is fixed; the interval is random.