1/16
Flashcards covering binary dependent variables, linear probability model limitations, probit and logit models, estimation and inference (including robust SEs), and the HMDA loan data example with interpretation of key variables such as pirat and black.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
What does LPM stand for and what is the nature of the dependent variable Y in LPM?
LPM stands for Linear Probability Model. The dependent variable Y is binary, taking values 0 or 1.
In the Linear Probability Model, how is E[Y|X] defined and what does it represent?
E[Y|X] equals the probability P(Y=1|X) and is modeled as a linear function: E[Y|X] = β0 + β1X1 + β2X2 + … ; Pr(Y=1|X) is given by this linear form.
Why is Var(u|X) a concern in the LPM, and what is its typical form?
Var(u|X) = p(X)[1 - p(X)], which depends on X. This leads to heteroskedasticity and invalid standard errors; robust standard errors are often used.
What are probit and logit models used for in binary outcome analysis?
They are non-linear models that estimate Pr(Y=1|X) via a function of a linear index: Φ(β0+β'X) for probit and F(β0+β'X) for logit.
What functions are Φ and F in probit and logit models?
Φ is the standard normal CDF (probit); F is the logistic CDF (logit); both map the latent index to a probability between 0 and 1.
How do you estimate probit/logit models in R, and how do you obtain robust standard errors?
Use glm with family = binomial and link = 'probit' or 'logit'. To obtain robust SE, use coeftest with vcovHC (type = 'HC1').
Why are coefficients in probit/logit not directly interpreted as changes in probability, and what should you use instead?
Because the relationship is non-linear; coefficients reflect changes in the latent index, not direct probability changes. Use marginal effects or predicted probabilities to interpret probability changes.
How do you compute a predicted probability for a given X in a probit or logit model?
Use predict(model, newdata = data.frame(…), type = 'response') to obtain Pr(Y=1|X).
How do you compute the difference in predicted probabilities when a covariate changes from x1 to x2?
Compute p̂ at x1 and p̂ at x2 using predict with newdata, then take the difference p̂(x2) − p̂(x1).
In the probit manual example, with β0 = -2.19, β1 = 2.97 and X = 0.4, what is Pr(Y=1|X)?
Pr(Y=1|X) = Φ(-2.19 + 2.97*0.4) = Φ(-1) ≈ 0.159.
Continuing the probit example, what is Pr(Y=1|X) for X = 0.5 and what is the difference when X goes from 0.4 to 0.5?
Pr(Y=1|X=0.5) = Φ(-0.705) ≈ 0.24; difference from X=0.4 is about 0.081 (8.1 percentage points).
What is the purpose of adding the 'black' variable in the HMDA probit model, as shown in the notes?
To test whether race (black vs. non-black) affects loan denial probabilities, controlling for other covariates.
What is the practical takeaway regarding probit vs. logit curves in these notes?
Both produce S-shaped curves; they yield similar predicted probabilities; choice of link rarely changes conclusions, and results should be interpreted via predicted probabilities or marginal effects.
What estimation method is highlighted for probit/logit models besides OLS?
Maximum Likelihood Estimation (MLE) is used for probit/logit models as an alternative to OLS.
What is a key limitation of the LPM when heteroskedasticity is present?
Standard hypothesis tests are invalid because Var(u|X) is not constant; robust standard errors are needed.
How can probability outcomes be extended beyond binary in these models, as hinted in the notes?
Y can be extended to multiple categories (e.g., Y ∈ {0,1,2}) which would require multinomial or ordered probit/logit models.
What key issues threaten internal validity listed in the notes (Page 12)?
Omitted variables, misspecified functional form, measurement error, sample selection bias, and simultaneity.