1/92
Vocabulary flashcards for key terms and concepts in econometrics, designed to help students prep for exams.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
Law of Iterated Expectations (LIE)
The expected value of X is equal to the expectation over the conditional expectation of X given Y: E[X] = E[E[X\|Y]].
Law of Total Variance (LTV)
var[Y] = E[var[Y\|X]] + var(E[Y\|X]) .
Linearity of Expectations (LOE)
E[aX + bY] = aE[X] + bE[Y] , where a and b are real numbers.
Variance of a Sum (VOS)
var(X + Y) = var(X) + var(Y) + 2Cov(X, Y) , if X and Y are correlated.
Jensen’s Inequality
E[g(X)] \geq g(E[X]) for a convex function g.
Chebyshev’s Inequality
P(\|Z\| \geq k) \leq 1/k^2 for a random variable Z.
Weak Law of Large Numbers (WLLN)
The sample mean converges in probability to the expected value as sample size increases: \bar{X}\_n \xrightarrow{p} E[X].
Central Limit Theorem (CLT)
As sample size increases, the distribution of the sample mean converges to a normal distribution: \sqrt{n}(\bar{X}\_n - \mu) \xrightarrow{d} N(0, \sigma^2)}.
Slutsky’s Theorem
If Xn converges in distribution to X, and An converges in probability to a constant a, then AnXn converges in distribution to aX.
Delta Method
An approximation technique for the asymptotic behavior of functions of random variables that are asymptotically normal.
Frisch-Waugh-Lovell Theorem (FWL)
Each predictor's regression coefficient in a multivariate model equals the regression coefficient estimated from a bivariate model of the residualized components.
Positive Definite (PD) Matrix
An n × n symmetric matrix M is positive definite if x'Mx > 0 for all non-zero x.
Positive Semi-Definite (PSD) Matrix
An n × n symmetric matrix M is positive semi-definite if x'Mx \geq 0 for all non-zero x.
Big O Notation (Op)
Characterizes the convergence in probability of a set of random variables.
Little o Notation (op)
Refers to convergence in probability towards zero.
Expectation Notation
Used to denote the average value of a random variable, often written as E[X]}.
Asymptotic Equivalence Lemma
Describes the conditions under which two estimates behave similarly in large samples.
Variance Definition
var(X) = E[X^2] - (E[X])^2 It measures the dispersion of a random variable around its mean. Variance quantifies the degree of variation or spread in a set of values. .
Covariance Definition
Cov(X, Y) = E[XY] - E[X]E[Y] . It measures the degree to which two random variables change together, indicating the direction of their relationship.
Regression Coefficient
A value representing the change in the dependent variable for a one-unit change in the independent variable.
Ordinary Least Squares (OLS)
A method for estimating the parameters in a linear regression model by minimizing the sum of squared residuals. \min{\beta} \sum{i=1}^n (yi - xi' \beta)²
Linear Projection
The process of projecting a point onto a set of predictors.
Residuals
The difference between the observed values and the predicted values from a regression model.
Idempotent Matrix
A matrix P such that P^2 = P.
Algebra of Limits
Rules governing the limits in calculus.
Independence of Random Variables
Two random variables X and Y are independent if P(X, Y) = P(X)P(Y) for all values.
Symmetric Matrix
A matrix that is equal to its transpose.
Matrix Multiplication
The dot product of rows and columns of matrices.
Eigenvalues
Scalars that describe the factor by which the eigenvector is stretched or compressed during a linear transformation.
Eigenvectors
Non-zero vectors that change by only a scalar factor during a linear transformation.
Statistical Significance
Indicates that the effect observed is unlikely to be due to chance.
Homoscedasticity
Assumption that the variance of the errors is constant across observations.
Multicollinearity
A situation in which two or more independent variables in a regression model are highly correlated.
Outlier
An observation that lies an abnormal distance from other values in a dataset.
Statistical Power
The probability of rejecting the null hypothesis when it is false.
Confidence Interval
A range of values that is likely to contain the population parameter with a specified level of confidence.
Hypothesis Testing
A method for testing a claim or hypothesis about a parameter in a population.
P-value
The probability of observing the test statistic or something more extreme under the null hypothesis.
Null Hypothesis
The hypothesis that there is no significant difference or effect.
Alternative Hypothesis
The hypothesis that there is a significant difference or effect.
Significance Level (α)
The probability of rejecting the null hypothesis when it is true, denoted as \alpha.
Type I Error
Rejecting the null hypothesis when it is actually true, with probability P(\text{Type I Error}) = \alpha.
Type II Error
Failing to reject the null hypothesis when it is actually false, with probability P(\text{Type II Error}) = \beta.
Statistical Model
A representation of a process that generates data; often includes assumptions about the data.
Sample Space
The set of all possible outcomes of a random variable.
Random Variable
A variable that can take on different values, each with a certain probability.
Discrete Random Variable
A random variable that can take on a countable number of values.
Continuous Random Variable
A random variable that can take on an infinite number of values within a given range.
Probability Distribution
A function that describes the likelihood of obtaining the possible values that a random variable can take.
Bayes’ Theorem
A mathematical formula for determining conditional probabilities: P(A\|B) = \frac{P(B\|A)P(A)}{P(B)}}.
Central Tendency
A statistical measure that identifies a single value as representative of an entire distribution.
Dispersion
The measure of how far a set of numbers is spread out from their average value.
Quantile
A statistical term that describes the division of a data set into equal partitions.
Sample Mean
The average of a subset of the population: \bar{X} = \frac{1}{n}\sum{i=1}^n Xi .
Population Mean
The average of all possible observations from the entire population: \mu = E[X]}.
Standard Deviation
A measure of the amount of variation or dispersion of a set of values, calculated as the square root of the variance: \sigma = \sqrt{Var(X)}}.
Variance
The square of the standard deviation.
Confidence Level
The percentage of times that a confidence interval would include the population parameter if a study were to be repeated.
Bootstrap Method
A resampling method used to estimate the distribution of a statistic.
Maximum Likelihood Estimation (MLE)
A method of estimating parameters of a statistical model by maximizing the likelihood function, L(\theta\|x) = P(x\|\theta)}.
Bayesian Statistics
A statistical paradigm that involves using Bayes' theorem to update the probability as more evidence becomes available.
Normal Distribution
A symmetric probability distribution characterized by its bell-shaped curve, with probability density function: f(x\|\mu, \sigma^2) = \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{(x-\mu)^2}{2\sigma^2}}}.
Logistic Regression
A statistical model used to predict binary outcomes.
Multivariate Regression
A regression analysis involving two or more independent variables.
Panel Data
Data that involves observations over multiple time periods for the same subjects.
Cross-Sectional Data
Data collected at a single point in time across multiple subjects.
Time Series Data
Data collected sequentially over time on the same subject.
Causal Inference
The process of drawing a conclusion about a causal connection based on the conditions of the occurrence of an effect.
Treatment Effect
The impact of a treatment on an outcome in a statistical study.
Confounding Variable
A variable that influences both the independent variable and the dependent variable.
Instrumental Variable
A variable used in regression analysis that is correlated with the independent variable and uncorrelated with the error term.
Endogeneity
A situation in which an explanatory variable is correlated with the error term.
Exogeneity
A situation in which an explanatory variable is uncorrelated with the error term.
Statistical Inference
The process of using data analysis to deduce properties of an underlying probability distribution.
F-Test
A statistical test used to determine if there are significant differences between the means of two or more groups, often using the F-statistic: F = \frac{\text{variance between groups}}{\text{variance within groups}}}.
T-Test
A statistical test used to compare the means of two groups, often using the t-statistic: t = \frac{\bar{x}1 - \bar{x}2}{sp\sqrt{\frac{1}{n1} + \frac{1}{n_2}}}}.
Chi-Squared Test
A statistical test used to determine if there is a significant association between two categorical variables: \chi^2 = \sum \frac{(Oi - Ei)^2}{E_i} .
ANOVA (Analysis of Variance)
A statistical method used to compare the means of three or more groups.
Regression Discontinuity Design
A quasi-experimental pretest-posttest design that emulates random assignment based on a cutoff score.
Difference in Differences
A statistical technique used to estimate causal relationships by comparing the differences in outcomes over time between a treatment group and a control group.
Propensity Score Matching
A statistical technique used to control for confounding by matching treated and untreated subjects with similar characteristics.
Hazard Function
A function that describes the instant risk of an event occurring at a given time, defined as h(t) = \lim_{\Delta t \to 0} \frac{P(t \le T < t + \Delta t \| T \ge t)}{\Delta t}.
Survival Analysis
A branch of statistics for analyzing the expected duration until one or more events happen.
Risks and Uncertainty
Assessment of uncertainty that affects potential outcomes of a decision.
Random Sampling
A technique used to select individuals from a population in such a way that every individual has an equal chance of being selected.
Sampling Bias
A bias that occurs when the sample is not representative of the population from which it was drawn.
How does the Central Limit Theorem (CLT) relate to statistical inference?
The CLT provides the theoretical basis for constructing confidence intervals and performing hypothesis tests for population parameters, particularly means, as it ensures that the sample mean's distribution approximates a normal distribution for large sample sizes.
What is the relationship between Endogeneity and Instrumental Variables?
Instrumental Variables are a method specifically designed to address Endogeneity in regression models by using an instrument that is correlated with the endogenous explanatory variable but uncorrelated with the error term, allowing for consistent estimation of causal effects.
How does the Frisch-Waugh-Lovell (FWL) Theorem relate to Ordinary Least Squares (OLS) regression?
The FWL theorem shows that OLS coefficients in a multiple regression can be understood as the result of a two-step process: first, residualizing the dependent and independent variables with respect to other regressors, and then regressing these residuals. This highlights the partial effect of each predictor in OLS.
What is the foundational concept behind the Law of Iterated Expectations (LIE)?
The LIE is fundamentally based on conditional expectation, articulating that the expected value of a random variable can be obtained by taking the expectation of its conditional expectation given another random variable: E[X] = E[E[X\|Y]]}.
How is Bayes' Theorem integral to Bayesian Statistics?
Bayes' Theorem serves as the core principle of Bayesian Statistics, providing the mathematical framework for updating the probability of a hypothesis (prior distribution) based on new observed data (likelihood) to obtain a revised probability (posterior distribution).
Projection Matrix (P)
A symmetric and idempotent matrix that transforms a vector into its orthogonal projection onto a subspace. For a matrix X with linearly independent columns, the projection matrix onto the column space of X is given by P = X(X'X)^{-1}X'.
Orthogonality Principle (in OLS)
In Ordinary Least Squares (OLS) regression, the residuals are orthogonal to the columns of the design matrix (X) of the predictors. This means their inner product is zero: X'e = 0 or E[xi ei] = 0.