Logistic Regression is a supervised machine learning algorithm used for binary classification, determining the relationship between a binary dependent variable and independent variables.
Goal: Identify a well-fitting model that describes the relationship between a binary dependent variable (Y) and a set of independent variables.
Y = 1 (event/success) or Y = 0 (no event/failure)
Basic Concepts
Dependent Variable: A binary outcome (e.g., success/failure).
Independent Variables: Variables that explain the dependent variable.
Example Scenario: Examine the relationship between device type and blood pressure measures to classify whether a reading is from a GP or a home device.
Logistic Regression Basics
Key Ideas:
Probability: Instead of predicting 0 or 1, we predict the probability of an event happening (e.g., P(Y=1)).
Odds: Odds indicate the likelihood of an event occurring as a function of probability.
Computation: Odds (O) = P/(1 - P)
Log Odds (Logit): To remove boundaries from probability (0-1) to (-∞, +∞), we use log odds:
Pseudo R-Squared: Measures how much the logistic model explains variability in the data, such as McFadden's R².
Classification Performance Metrics:
Accuracy, Sensitivity, Specificity, Precision, F1 Score, and ROC Curve analysis for evaluating model performance.
Conclusion and Further Reading
Logistic regression is a powerful tool for predicting binary outcomes. It balances interpretability and predictive ability effectively.
Recommended Reading: Washington et al. (2020). Statistical and Econometric Methods for Transportation Data Analysis.
Final Note: Understanding logistic regression involves conceptualizing how probability transforms into odds and log odds, making it a robust method for classification tasks.