In Depth Notes on Logistic Regression Analysis

Logistic Regression Analysis

Introduction to Logistic Regression

  • Logistic Regression is a supervised machine learning algorithm used for binary classification, determining the relationship between a binary dependent variable and independent variables.
  • Goal: Identify a well-fitting model that describes the relationship between a binary dependent variable (Y) and a set of independent variables.
    • Y = 1 (event/success) or Y = 0 (no event/failure)

Basic Concepts

  • Dependent Variable: A binary outcome (e.g., success/failure).
  • Independent Variables: Variables that explain the dependent variable.
  • Example Scenario: Examine the relationship between device type and blood pressure measures to classify whether a reading is from a GP or a home device.

Logistic Regression Basics

  • Key Ideas:
    • Probability: Instead of predicting 0 or 1, we predict the probability of an event happening (e.g., P(Y=1)).
    • Odds: Odds indicate the likelihood of an event occurring as a function of probability.
    • Computation: Odds (O) = P/(1 - P)
    • Log Odds (Logit): To remove boundaries from probability (0-1) to (-∞, +∞), we use log odds:
    • Logit(P) = log(P/(1-P))

Modeling with Logistic Regression

  • Logistic Regression Equation:
    • logit(P(X)) = β0 + β1x1 + β2x2 + … + βpxp
    • P(X) = e^(β0 + β1x1 + β2x2 + … + βpxp) / (1 + e^(β0 + β1x1 + β2x2 + … + βpxp))
  • Max Likelihood Estimation (MLE): Used to estimate parameters. It finds the values of parameters that maximize the likelihood of the observed data.
  • Log-Likelihood Function: Logarithm of the likelihood function simplifies the estimation task, allowing optimization of parameters.

Interpretation of Outcomes

  • Coefficients (β):
    • β0: Log odds of the baseline event.
    • βi: Log odds ratio for each independent variable.
    • e^(βi): Odds ratio indicating how changes in the independent variable influence the odds of the outcome.
  • Example: For a one-week increase in gestational age, the increase in the odds of normal birth weight could be expressed through the coefficient (β).

Goodness of Fit in Logistic Regression

  • Goodness-of-Fit Tests: Assess how well models predict outcomes compared to observed data. Includes:
    • Chi-Square Tests, Deviance, Hosmer-Lemeshow Tests.
  • Pseudo R-Squared: Measures how much the logistic model explains variability in the data, such as McFadden's R².
  • Classification Performance Metrics:
    • Accuracy, Sensitivity, Specificity, Precision, F1 Score, and ROC Curve analysis for evaluating model performance.

Conclusion and Further Reading

  • Logistic regression is a powerful tool for predicting binary outcomes. It balances interpretability and predictive ability effectively.
  • Recommended Reading: Washington et al. (2020). Statistical and Econometric Methods for Transportation Data Analysis.
  • Final Note: Understanding logistic regression involves conceptualizing how probability transforms into odds and log odds, making it a robust method for classification tasks.