In Depth Notes on Logistic Regression Analysis

Logistic Regression Analysis

Introduction to Logistic Regression

Logistic Regression is a supervised machine learning algorithm used for binary classification, determining the relationship between a binary dependent variable and independent variables.
Goal: Identify a well-fitting model that describes the relationship between a binary dependent variable (Y) and a set of independent variables.
- Y = 1 (event/success) or Y = 0 (no event/failure)

Basic Concepts

Dependent Variable: A binary outcome (e.g., success/failure).
Independent Variables: Variables that explain the dependent variable.
Example Scenario: Examine the relationship between device type and blood pressure measures to classify whether a reading is from a GP or a home device.

Logistic Regression Basics

Key Ideas:
- Probability: Instead of predicting 0 or 1, we predict the probability of an event happening (e.g., P(Y=1)).
- Odds: Odds indicate the likelihood of an event occurring as a function of probability.
- Computation: Odds (O) = P/(1 - P)
- Log Odds (Logit): To remove boundaries from probability (0-1) to (-∞, +∞), we use log odds:
- Logit(P) = log(P/(1-P))

Modeling with Logistic Regression

Logistic Regression Equation:
- logit(P(X)) = β0 + β1x1 + β2x2 + … + βpxp
- P(X) = e^(β0 + β1x1 + β2x2 + … + βpxp) / (1 + e^(β0 + β1x1 + β2x2 + … + βpxp))
Max Likelihood Estimation (MLE): Used to estimate parameters. It finds the values of parameters that maximize the likelihood of the observed data.
Log-Likelihood Function: Logarithm of the likelihood function simplifies the estimation task, allowing optimization of parameters.

Interpretation of Outcomes

Coefficients (β):
- β0: Log odds of the baseline event.
- βi: Log odds ratio for each independent variable.
- e^(βi): Odds ratio indicating how changes in the independent variable influence the odds of the outcome.
Example: For a one-week increase in gestational age, the increase in the odds of normal birth weight could be expressed through the coefficient (β).

Goodness of Fit in Logistic Regression

Goodness-of-Fit Tests: Assess how well models predict outcomes compared to observed data. Includes:
- Chi-Square Tests, Deviance, Hosmer-Lemeshow Tests.
Pseudo R-Squared: Measures how much the logistic model explains variability in the data, such as McFadden's R².
Classification Performance Metrics:
- Accuracy, Sensitivity, Specificity, Precision, F1 Score, and ROC Curve analysis for evaluating model performance.

Conclusion and Further Reading

Logistic regression is a powerful tool for predicting binary outcomes. It balances interpretability and predictive ability effectively.
Recommended Reading: Washington et al. (2020). Statistical and Econometric Methods for Transportation Data Analysis.
Final Note: Understanding logistic regression involves conceptualizing how probability transforms into odds and log odds, making it a robust method for classification tasks.