1/32
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
Covariance
Measures how two variables change together. Formula: Cov(X,Y) = Σ(xᵢ−x̄)(yᵢ−ȳ)/n. Positive → both move in same direction. Negative → they move opposite. Zero → no linear relationship.
Correlation
Standardized covariance bounded between [−1, 1]. Formula: ρ = Cov(X,Y)/(σ_X · σ_Y). ρ=1 is perfect positive, ρ=−1 is perfect negative. Measures strength and direction of a linear relationship. DOES NOT imply causation.
Variance
A measure of the spread of data points around the mean. Formula: σ² = Σ[xᵢ − E(X)]² · P(xᵢ). Always ≥ 0. Squaring prevents positive/negative deviations from canceling and penalizes larger deviations more heavily.
Central Limit Theorem
As sample size grows, the sampling distribution of the sample mean approaches a normal distribution regardless of the population's shape. Formula: X̄ ~ N(μ, σ²/n) as n → ∞. Justifies using normal-based inference in econometrics.
Law of Large Numbers
As sample size grows, the sample mean converges to the true population mean. Guarantees that larger samples produce more accurate estimates. Different from CLT: LLN is about convergence of the value; CLT is about the shape of the distribution.
Standard Deviation
Measures the spread of a variable X in its original units. Formula: σ = √Var(X). Describes how dispersed a distribution is around its mean.
Standard Error
Measures the spread of an estimator across repeated samples. Formula: SE(X̄) = σ/√n. As sample size increases, SE decreases. Used to build confidence intervals and t-statistics.
Selection Bias
The difference in baseline (untreated) outcomes between treatment and control groups. Formula: Selection Bias = E[Y(0)|X=1] − E[Y(0)|X=0]. Arises when treatment is not randomly assigned. Means the observed difference in means is NOT equal to the ATT.
Omitted Variable Bias
Bias that occurs when a relevant variable Z is left out of a regression. Two conditions required: (1) Z is correlated with X, AND (2) Z affects Y. Formula: β̂₁ = β₁ + β₂·Cov(X₁,X₂)/Var(X₁). Direction of bias = sign(β₂) × sign(Cov(X₁,X₂)).
P-value
A measure that helps determine the significance of results in hypothesis testing, indicating the probability of observing the data given that the null hypothesis is true. The probability of observing a t-statistic as extreme as the one computed, assuming H₀ is true. p < 0.05 → reject H₀ at 5% level. p < 0.01 → reject at 1% level. Does NOT measure the probability that H₀ is true.
Z-score
Standardizes any normal variable to the standard normal N(0,1). Formula: Z = (X − μ)/σ. Measures how many standard deviations an observation is from the mean. |Z| > 1.96 → in the extreme 5% of the distribution.
T-stat
A statistic used in hypothesis testing that measures the size of the difference relative to the variation in the sample data. Measures how many standard errors a coefficient is from the null hypothesis value. Formula: t = β̂ / SE(β̂). |t| > 1.96 → significant at 5% level. |t| > 2.58 → significant at 1% level. Rule of thumb: if |β̂/SE| ≥ 2, statistically significant.
Conditional Expectation
The expected value of Y given a specific value of X. Written as E[Y | X = x]. OLS estimates this as a linear function. The key OLS assumption is E[ε | X] = 0 (zero conditional mean). If violated, estimates are biased.
Functional Form
The mathematical specification of how X relates to Y. Level-Level: 1 unit ↑ in X → β change in Y. Log-Level: 1 unit ↑ in X → β×100% change in Y. Level-Log: 1% ↑ in X → β/100 unit change in Y. Log-Log: 1% ↑ in X → β% change in Y (elasticity).
Residual (Ui)
The difference between the actual value of Y and the predicted value from the regression. Formula: ûᵢ = Yᵢ − Ŷᵢ. OLS minimizes the sum of squared residuals. By construction, Σûᵢ = 0. The residual is the estimated version of the unobservable true error term.
Intercept
The expected value of Y when all independent variables equal zero. Formula: β₀ = E[Y | X = 0]. With a dummy variable, it represents the expected Y for the baseline group. Interpretation is not always meaningful depending on context.
Reverse Causality
Occurs when the true direction of causality is opposite to what's assumed — Y actually causes X rather than X causing Y. Violates E[ε|X] = 0, creating endogeneity and biasing OLS estimates. Example: Does police presence reduce crime, or does crime attract more police?
Multicollinearity
When two or more independent variables are highly correlated. OLS can still be computed (unless perfect) but standard errors are inflated, making it harder to achieve statistical significance. Does NOT cause bias — only reduces efficiency. Perfect multicollinearity makes OLS impossible to compute.
Heteroskedasticity
When the variance of the error term is not constant across values of X: Var(εᵢ|Xᵢ) = σᵢ² (varies). OLS coefficient estimates remain unbiased but standard errors are wrong, invalidating t-stats and hypothesis tests. Fix: use heteroskedasticity-robust standard errors.
Homoscedasticity
When the variance of the error term is constant across all values of X: Var(εᵢ|Xᵢ) = σ² (constant). One of the classical OLS assumptions. When satisfied, OLS is BLUE. In practice, always use robust standard errors to guard against violations.
Hypothesis Testing
A procedure to test claims about population parameters using sample data. Null hypothesis H₀ (e.g., β=0) vs. alternative H₁ (e.g., β≠0). We either "reject H₀" or "fail to reject H₀" — never "accept H₀." Rejecting H₀ at 5% means the result would occur less than 5% of the time if H₀ were true.
Normal/Gaussian Curve
The most important distribution in econometrics. Bell-shaped, symmetric around the mean, fully determined by μ and σ. Empirical rule: 68% of data within 1 SD, 95% within 2 SDs, 99.7% within 3 SDs.
Slope Coefficient
The expected change in Y associated with a one-unit increase in X. Formula: β̂₁ = Cov(X,Y)/Var(X). This algebraic identity always holds regardless of bias. With controls, interpretation becomes "holding other variables constant." Causal interpretation requires OLS assumptions to hold.
Randomization
Random assignment of units to treatment and control, making groups comparable on all characteristics. Eliminates selection bias because treatment is independent of potential outcomes: Xᵢ ⊥ (Yᵢ(1), Yᵢ(0)). Allows the simple difference in means to estimate the ATE. Gold standard for causal inference.
ATT (Average Treatment Effect on the Treated)
The expected causal effect of treatment, averaged only over those who were actually treated. Formula: ATT = E[Yᵢ(1) − Yᵢ(0) | Xᵢ = 1]. Observed difference in means = ATT + Selection Bias. ATT = ATE only under random assignment.
ATE (Average Treatment Effect)
The average effect of a treatment across the entire population, including both treated and untreated individuals. The expected causal effect of treatment averaged over the entire population. Formula: ATE = E[Yᵢ(1) − Yᵢ(0)]. Unobservable directly because we never see both potential outcomes for the same person. Equals ATT when treatment is randomly assigned.
R^2
The fraction of variation in Y explained by the regression model. Formula: R² = 1 − SSR/TSS. R²=0.75 means "75% of variation in Y is explained by the model." Increases with every variable added — even irrelevant ones. Does not prove causality or indicate model correctness.
Adjusted R^2
A modified R² that penalizes adding irrelevant variables. Formula: R̄² = 1 − [(n−1)/(n−k−1)]·SSR/TSS. Always ≤ R². Can be negative. Unlike R², can decrease when a new variable adds little explanatory power. Use this to compare models with different numbers of regressors.
Mis-specification Bias
A type of OVB where the wrong functional form is used (e.g., fitting a line when the true relationship is quadratic). Since X and X² are always correlated, omitting X² biases the coefficient on X. Formula: β̂₁ = β₁ + β₂·Cov(X₁,X₁²)/Var(X₁).
Measurement Error Bias
Bias from measuring an independent variable with error. With classical measurement error, X_observed = X_true + e, the result is attenuation bias — the coefficient is biased toward zero. If β₁ > 0, the estimate will be smaller than the true value. Non-classical measurement error can bias in any direction.
Reverse Causality Bias
Occurs when the true direction of causality is opposite to what's assumed — Y actually causes X rather than X causing Y. Violates E[ε|X] = 0, creating endogeneity and biasing OLS estimates. Example: Does police presence reduce crime, or does crime attract more police?
OLS Assumptions
Conditions for OLS to produce Best Linear Unbiased Estimates (BLUE): (1) Linearity, (2) Independence of errors across observations, (3) Homoskedasticity (constant error variance), (4) No multicollinearity, (5) No endogeneity — E[ε|X] = 0. Violations lead to biased or inefficient estimates.
Confidence Interval
A range constructed from sample data. 95% CI: β̂ ± 2·SE(β̂). Correct interpretation: if we repeated sampling many times, 95% of constructed intervals would contain the true parameter. The true parameter is fixed; the interval is random.