Lecture 11
Binary Logistic Regression
Predict a binary dependent variable based on one or multiple quantitative or categorical independent variables.
Dependent variable
A variable that is being tested and measured in an experiment.
Independent variable
A variable that is manipulated to observe its effect on the dependent variable.
Continuous variable
A variable that can take an infinite number of values within a given range.
Categorical variable
A variable that can take on one of a limited, fixed number of possible values.
Linear regression
A statistical method for modeling the relationship between a dependent variable and one or more independent variables using a linear equation.
Logistic regression
A statistical method for predicting the outcome of a binary dependent variable based on one or more independent variables.
ANCOVA
Analysis of Covariance, a blend of ANOVA and regression that evaluates whether population means of a dependent variable differ across levels of a categorical independent variable while statistically controlling for the effects of other continuous variables.
χ²-tests
Chi-squared tests, statistical tests used to determine if there is a significant association between categorical variables.
t-test
A statistical test used to compare the means of two groups.
ANOVA
Analysis of Variance, a statistical method used to compare the means of three or more samples.
Contingency tables
A type of table in a matrix format that displays the frequency distribution of the variables.
Dichotomous dependent variables
Dependent variables that have two possible outcomes, such as pass/fail or alive/dead.
Research question
A question that a research project sets out to answer.
Sample size
The number of subjects included in a study, in this case, 75 students.
Raw Data
The initial data collected that has not been processed or analyzed.
Prediction errors
The difference between the predicted values and the actual values.
Linear Regression Equation
𝑌′ = 𝑏0 + 𝑏1𝑋, where 𝑌 is the predicted value, 𝑏0 is the intercept, and 𝑏1 is the slope.
Assumption of normality
The assumption that the errors in a regression analysis are normally distributed.
Trend analysis
The practice of collecting information and attempting to spot a pattern, or trend, in the data.
Prediction Errors
The errors e (prediction errors) are not normally distributed; so this assumption of linear regression analysis is violated.
Probability Range
Probabilities lie between 0 and 1.
Positive Relationship
The more hours someone studies, the higher the probability that he/she will obtain the bachelor's degree.
Negative Relationship
The more hours someone studies, the lower the probability that he/she will obtain the bachelor's degree.
Logistic Function
A mathematical function used to describe the relationship between hours studied and the probability of obtaining a bachelor's degree.
Logistic Function Formula
𝑃𝑌= 1 𝑋= 𝑒(𝑏0+𝑏1𝑋) / (1+𝑒(𝑏0+𝑏1𝑋))
Base of Natural Logarithm
𝑒 = 2.718282.
S-shaped Function
A non-linear function that takes values between 0 and 1.
Logistical Coefficients
𝑏0 and 𝑏1 are the logistical (regression) coefficients.
Example Logistic Function
Example with 𝑏0 = −1 and 𝑏1 = 1.5 → 𝑃𝑌= 1 𝑋= 𝑒(−1+1.5𝑋) / (1+𝑒(−1+1.5𝑋)).
Parameter b1
Determines the slope of the functions, and whether the function is increasing (𝑏1 > 0) or decreasing (𝑏1 < 0).
Using Logistic Function
Logistic function is a non-linear function; linear regression techniques cannot be applied.
Transforming Probabilities
Using some simple (mathematical) steps, we can turn a non-linear logistic regression function into a linear function.
Transforming to Odds
First, we transform probabilities to odds.
Proportion of Obtaining Bachelor's
We could calculate the proportion of obtained bachelor degrees for each score on the variable hours studying.
Logistic Function with b0 = 0
Logistic function: 𝑃𝑌= 1 𝑋= 𝑒(𝑏0+𝑏1𝑋) / (1+𝑒(𝑏0+𝑏1𝑋)) = 𝑒𝑏1𝑋 / (1+𝑒𝑏1𝑋).
Logistic Function Examples
Examples of logistic functions with different 𝑏1 and 𝑏0 = 0.
Logit
logit Y = 1 X = ln odds[Y = 1|X]
Natural logarithm
ln is the natural logarithm (logarithm with the base e)
Logistic model
logit Y = 1 X = b0 + b1X
From probability to odds
odds = P / (1 - P)
From odds to probability
P = odds / (1 + odds)
From odds to logit
logit = ln(odds)
From probability to logit
logit = ln P / (1 - P)
From logit to odds
odds = e^logit
From logit to probability
P = e^logit / (1 + e^logit)
Interpretation of Odds
How many times larger is the probability of Y=1 than of Y=0 given X?
Example of Odds Calculation
P(Y=1|X) = 0.10 → odds = 0.10/(1 − 0.10) = 0.10/0.90 = 0.111
Relationship between scales
If the probability increases, the odds and logit also increase, and vice versa.
Probability and Odds Table
Probability (P) | Odds | Logit
Probability 0.5
P = 0.5 → Odds = 1 → Logit = 0
Probability less than 0.5
0 ≤ P < 0.5 → 0 < odds < 1 → Negative Logit
Probability greater than 0.5
0.5 < P ≤ 1 → Odds > 1 → Positive Logit
Logit Interpretation
The logit is a linear function of X.
Graph of Odds
The corresponding graph for b0 = -1 and b1 = 1.5 shows Odds[Y=1|X].
Graph of Logit
The corresponding graph for b0 = -1 and b1 = 1.5 shows Logit[Y=1|X].
Exam Question Example
What are the odds of guessing the correct answer in an exam with four categories?
Odds
odds = 𝑃/(1 −𝑃)
Probability from Odds
𝑃= odds/(1 + odds)
Logit from Odds
logit = ln(odds)
Logit from Probability
logit = ln 𝑃/(1 −𝑃)
Odds from Logit
odds = 𝑒logit
Probability from Logit
𝑃=𝑒logit/(1+𝑒logit)
Logistic Regression Analysis
A statistical method to predict the outcome of a dependent variable based on one or more independent variables.
Independent Variable
Number of hours studied
Dependent Variable
Obtaining (𝑌= 1) or not obtaining (𝑌= 0) the Bachelor's degree in three years
Logit Equation
Logit[obtaining bachelor's degree] = −4.9 + 0.294Hours
Effect of Hours on Logit
If the amount of hours studied increases, the logit - and thus probability - of obtaining the Bachelor's degree also increases.
Numerical Example for Logit
Logit = −4.9 + 0.294 ∙Hours
Effect of Hours on Odds
odds = 𝑒logit = 𝑒−4.9+0.249∙Hours
Odds Increase Factor
When the number of hours studied increases by one, the odds increase by a factor 𝑒0.294 = 1.342.
Wald Test Hypotheses
H0: 𝐵= 0 against H1: 𝐵≠0
Significant Effect of Hours
There is a significant effect of Hours on the probability of obtaining the Bachelor's degree.
Logit Equation for Blood Pressure
Logit(BloodPressure) = -4.2 + 0.07*Age
Probability Calculation
What is the probability that a 40 year-old person has a high blood pressure?
Age Probability Threshold
At what age is the probability that someone has a high blood pressure 0.5?
SPSS Output Example
B: .294, S.E.: .067, Wald: 18.994, df: 1, Sig.: .000, Exp(B): 1.342
Constant in SPSS Output
Constant: -4.900, S.E.: 1.157, Wald: 17.923, df: 1, Sig.: .000, Exp(B): .007
Logit for 15 Hours
Logit = -0.490, Odds = .380, Probability = .380
Logit for 20 Hours
Logit = 0.980, Odds = 2.664, Probability = .727