In-Depth Notes on Logit Analyses and Logistic Regression
Logit Analyses Overview
- Logit analyses are used to model the relationship between multiple independent variables and a dichotomous dependent variable.
- Typically follow a similar organization to linear regression studies, with descriptive statistics before presenting multivariate results.
Logistic Regression
- Binomial logistic regression is a model for dichotomous dependent variables, coded typically as 0 and 1.
- Example of dependent variables in political science: voting behavior, approval ratings, presence of judicial systems.
Dichotomous Variables
- Defined as having only two possible outcomes (yes/no), sometimes referred to as binary variables.
- Examples:
- Did you vote? (Yes=1, No=0)
- Do you approve of the President? (Yes=1, No=0)
Dichotomous Variable Conventions
- Negative outcomes typically coded as 0 (e.g., did not vote, does not have independent judiciary).
- Positive actions coded as 1 (e.g., voted, has independent judiciary).
When to Use Logistic Regression
- Used for observing how independent variables affect the probability of certain outcomes (e.g., decisions, actions).
- Examples of outcomes include voting, participation in protests, and war occurrence.
Interpretation of Logistic Results
- Logistic regression coefficients differ from linear regression and focus on changes in the chance of outcomes occurring based on independent variables.
- Assessing the coefficient's sign can inform if it increases (positive) or decreases (negative) the odds of an event.
- Statistical significance can be evaluated using familiar measures.
Problems with Linear Regression for Dichotomous Variables
- Linear regression may inaccurately predict values between 0 and 1 for dichotomous outcomes, failing to acknowledge the binary nature of the dependent variable.
- Results can lead to probabilities that are nonsensical (e.g., predicting a probability of 0.6 for a variable that must be either 0 or 1).
Solution through S-shaped Curve
- Logistic regression employs an S-shaped, or sigmoidal curve, to better fit the data, minimizing prediction errors across ranges.
- The shape reflects the reality that outcomes are only 0 or 1, eliminating predictions that fall outside this range.
Logit and Probit Models
- Both methods utilize an S-curve to model relations involving dichotomous variables.
- Logit models may be preferred in political science due to their interpretability in terms of odds ratios.
Maximum Likelihood Estimation
- Both logit and probit utilize maximum likelihood estimation, adjusting model parameters to maximize compatibility with observed data.
Sample Size Considerations
- Using maximum likelihood with small samples can lead to unreliable results.
- Aim for at least 100 observations for valid results.
Advanced Modeling Beyond Logit and Probit
- Multinomial logit/probit can handle multiple categories beyond dichotomous responses, bridging methods for ordinal values.
Conducting Logistic Regression
- Dependent variables must be dichotomous (0 and 1).
- If necessary, recode values before analysis and validate with frequency tables.
Understanding Marginal Effects
- Marginal effects provide insight into how shifting one independent variable while holding others constant influences the predicted outcomes.
- Useful for interpreting effects on voting behavior and attitudes based on derived models.
Takeaways
- Logistic regression resembles linear models for dichotomous outcomes but requires different interpretation methods.
- Relationship direction (positive/negative) and statistical significance can be observed and transformed into more interpretable marginal effects.