1/49
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
discrete distribution
one of a countable number of events
continuous distribution
one of an uncountably infinite set
data-generating process
the stochastic mechanism that generates the observed sample
likelihood
L(θ|D) = f(D|θ)
f is a joint density for continuous data
f is a joint mass function for discrete data
DGP interpretation
parameters are fixed and data are random
likelihood analysis interpretation
observed data are fixed and we compare candidate parameter values
maximum likelihood estimator
choose parameters that maximize the plausibility of the observed sample under the assumed probability model
joint likelihood for continuous variables
L(θ|D) = Π (N, i=1) f(Di|θ)
joint likelihood for discrete variables
L(θ|D) = Π (N, i=1) P [Di=di|θ]
steps in maximum likelihood estimation
gather data (independent draws)
construct the likelihood using the appropriate formulation
maximize the likelihood (typically via log-likelihood)
monotonic transformations preserve
the location of extreme points (but not their values)
key properties of log-likelihood
products become sums
exponents become coefficients
for a local extreme value: df/dx = 0
under correct specification and standard regularity conditions, many familiar OLS properties are
special cases of general MLE results
consistency
asymptotic normality
asymptotic efficiency within regular parametric models
properties of maximum likelihood estimation (MLE)
consistency
the MLE converges to the true parameter value with probability as sample size grows
asymptotic efficiency
MLE achieves the lowest possible variance among unbiased estimators in large samples
pseudo-true parameter under misspecification
even with model misspecification, MLE converges to a meaningful parameter value
asymptotic normality
allows for hypothesis testing and confidence intervals using normal approximations
transformation invariance
natural plug-in estimators for transformed parameters
functional invariance interpretation
once we estimate the primitive parameters, many economically interesting objects are just plug-in transformations of θhat
OLS applies directly only when
the model is linear in parameters and has an additive error term
MLE lets us estimate models that are
nonlinear in parameters, nonlinear in probabilities, or based on non-Gaussian outcomes
hypothesis testing with MLE (single parameters)
use the large-sample normal approximation for θhat
this justifies estimated standard errors, test statistics, and confidence intervals
results are asymptotic approximations, not exact finite-sample properties
functions of parameters
use plug-in estimates together with delta method or bootstrap
transformation invariance tells us the natural plug-in estimator
multiple hypothesis tests
many ways to conduct multiple hypothesis tests with MLE
pseudo R²
compares simple benchmark with richer model using likelihoods
formula: 1 - [ln(Lu)/ln(L0)] where: Lu = unrestricted model being estimated and L0 = simple model with only intercept
if the true model is simple
L0 = Lu, so pseudo R² = 0
if the richer model fits better
Lu > L0, pseudo R² rises above 0
two types of binary outcomes
unconditional probability and conditional probability
linearity probability model (LPM)
model: yi = xi’β+ui where yi ε {0,1}
fitted conditional mean: xi’βhat
core limitation: linearity can generate predicted values outside [0,1]
error structure problems in a LPM
errors are definitely not normal
error distribution depends on predicted probability values
cannot be treated as random normally-distributed disturbances
a generalized linear model connects a linear index to the conditional mean through
a link function
in the LPM, the link is
the identity: pi = xi’β
generally we pick a monotone function g such that g(pi) = xi’β
logit
g(p) = ln (p/(1-p))
probit
g(p) = ϕ^-1 (p)
latent index
define an unobserved latent variable: yi* = xi’β + ui
not directly observed: the latent index exists only in the model
interpretation: represents net utility, net benefit, or underlying propensity to choose
threshold rule: the binary outcome records whether this latent variable crosses a threshold
threshold mapping rule
yi = {1 if yi*≥0, 0 otherwise
properties of the logit model
very easy to compute numerically (closed-form expression)
easy to add other terms (not just binary logit)
because only the ratio β/σu is identified in the latent-index model, normalizing the error scale is essential
logit model
assume the error ui follows a standard logistic distribution. the logistic CDF is:
F(z) = (1/1+exp(-z))
probit model
assume the error ui follows a standard normal distribution. the standard normal CDF is:
F(z) = ∫z,-∞ ((1/√2π)exp(-0.5x²)dx = ϕ(z)
outside option
the alternative of purchasing nothing has normalized mean utility of 0
multiple inside goods
allows multiple products indexed by j in addition to the outside option
normal shocks
this gives multinomial probit, which is flexible but usually requires simulation
type 1 extreme value shocks
this gives a closed-form multinomial logit choice probability
poisson regression
we usually specify the conditional mean as
ln(Ε[yi | xi, θ]) = ln(λi) = xi’θ
in the linear fixed-effects model, we remove the individual effect by
demeaning or differencing
the within estimator is consistent for β under standard strict-exogeneity conditions
but the estimated λi values themselves are based on only T observations per person
with small T, those individual effects are noisy
for random effects we need
σ²v: the variance of the individual time period shocks
σ²λ: the variance of the individual fixed effects in the population
T: number of time periods
for continuous random variables, a probability density function describes
how probability is distributed over the support
derivative of a CDF at a given point
for a discrete random variables, a probability mass function is
the probability that a specific number is drawn
the support of a distribution
the range of possible values
censored distribution
the unit is observed, but the realized value is only known up to a threshold
truncated distribution
the unit is absent from the observed sample whenever its latent value falls outside the admissible range
tobit
censoring plus a continuous latent outcome
adverse selection
one side of the market has private information
Heckman two-step correction (heckit)
step 1: selection equation
d*i = zi’λ + vi with observed participation indicator: di = 1[di* ≥ 0]
step 2: outcome equation
yi = xi’β + ui