9/24: SOCI 252 - Linear Regression with Binary Outcomes

Review of Previous Lecture

Predicting Final Exam Grade (Non-Binary Outcome):
- Model used: predicted final exam grade $= -6.01 + 0.97 imes ext{midterm}$ .
- If midterm grade is $80$ : Predicted final exam grade is $0.97 imes 80 - 6.01 = 77.6 - 6.01 = 71.59$ . Unit of measurement is points.
- If midterm grade is $90$ : Predicted final exam grade is $0.97 imes 90 - 6.01 = 87.3 - 6.01 = 81.29$ . Unit of measurement is points.
- Predicted Change: A $10$ point difference on the midterm ( $90 - 80$ ) leads to an expected difference of $9.7$ points on the final exam ( $81.29 - 71.59$ or $0.97 imes 10$ ).

Predicting Binary Outcomes: Earning an 'A'

Dataset Overview:
- Same dataset as before, but the outcome variable changes.
- midterm and final grades are non-binary numeric.
- grade_a (whether a student got an 'A' or 'A-') is a binary variable ( $0$ for no 'A', $1$ for 'A').
- tail() function: Used to view the last six observations, similar to head() for the first six.
- Interpretation of the last observation involves describing the characteristics of that specific student across all variables.
Predictor and Outcome: We are using midterm as the predictor to determine grade_a.
Visualizing Binary Outcomes:
- A histogram for midterm grades shows typical distribution (skewed or centered, with outliers).
- A histogram for grade_a (binary) only shows two bars, representing the count of students who got an 'A' ( $1$ ) versus those who didn't ( $0$ ). It is less useful for showing variation but indicates which outcome is more common (e.g., not getting an 'A' is more common in this dataset).
Interpreting the Mean for Binary Outcomes:
- The mean of a binary variable (coded $0$ or $1$ ) represents the proportion of observations that have the characteristic represented by $1$ .
- For grade_a, the mean is $0.368$ (or 37 ext{%}).
- Interpretation: Approximately 37 ext{%} of students in this class ended up an 'A' or 'A-'. The remaining 63 ext{%} did not.

Units of Measurement for Binary Outcomes (Crucial Distinction)

When the outcome variable (y) is binary ( $0$ or $1$ ), the units of measurement change significantly compared to non-binary outcomes.
Unit for Average of y: Percentage (e.g., 37 ext{%} of students got an A).
Unit for Intercept ($oldsymbol{\alpha_hat}$): Percentage.
Unit for Predicted y ($oldsymbol{y_hat}$): Percentage (representing probability).
Unit for Change in y ($oldsymbol{\Delta y}$): Percentage points.
- This is distinct from percentage because subtracting percentages yields percentage points. For example, a change from 4 ext{%} to 2 ext{%} is a $2$ percentage point decrease, not a 2 ext{%} decrease (which would be 2 ext{%} of 4 ext{%} or 0.08 ext{%}).
Unit for Slope ($oldsymbol{\beta_hat}$): Percentage points.
Unit for Change in $oldsymbol{y_hat}$ ($oldsymbol{\Delta y_hat}$): Percentage points.
- Recap: Anytime we discuss a change in percentage, it must be expressed in percentage points to avoid confusion with a percentage of a percentage.

Scatterplot for Binary Outcomes

Appearance: With a binary outcome ( $0$ or $1$ ) on the y-axis and a continuous predictor (midterm) on the x-axis, the scatterplot looks like two horizontal lines of dots at $y=0$ and $y=1$ . It does not form a 'cloud' that a single linear line easily fits.
Interpretation of Dots: Each dot represents an individual student's midterm score and whether they got an 'A'.
Relationship (Positive/Negative):
- The relationship appears positive: students with higher midterm grades tend to have a higher density of dots at $y=1$ (got an 'A'), while lower midterm grades have a higher density at $y=0$ (did not get an 'A').
- Students who did not get an 'A' (y=0) show a much wider range of midterm scores, including many lower scores.
- Students who got an 'A' (y=1) typically concentrated in higher midterm score ranges, with fewer low scores.
Strength of Relationship: The relationship is stronger when the two groups ( $y=0$ and $y=1$ ) are more distinct in their x-axis ranges. If their ranges overlap significantly (e.g., looking like parallel lines with lots of overlap across x-values), the relationship is weak. In this case, it's moderately strong because the 'A' group is fairly distinct from the 'no A' group in terms of typical midterm scores.

Correlation ( $r$ )

Calculation: The correlation between grade_a and midterm is $0.64$ .
Interpretation: A positive correlation of $0.64$ indicates a moderate-to-strong positive linear relationship. It means that 64 ext{%} of the variation in whether a student gets an 'A' or 'A-' is explained by their midterm grades. This is fairly high, particularly given the large range of midterm scores for students not getting an 'A' and the smaller range for those who did.

Fitting the Linear Model

Method: Using the LM() function, similar to non-binary outcomes: LM(data$grade_a ~ data$midterm).
Results:
- Intercept ($oldsymbol{\alpha_hat}$): $-1.34$
- Slope ($oldsymbol{\beta_hat}$): $0.02$
Fitted Line Formula: estimated grade_a $= -1.34 + 0.02 imes ext{midterm}$

Interpreting Coefficients for Binary Outcomes

Intercept ($oldsymbol{\alpha_hat} = -1.34 ext{%}$):
- Mathematical interpretation: If a student scored $0$ on the midterm, the predicted probability of them earning an 'A' is -134 ext{%}.
- Sensibility: This is nonsensical because probabilities cannot be negative. This occurs because $x=0$ (a midterm score of zero) is outside the observed range of midterm grades in the dataset (the minimum observed was closer to $40$ ).
- Unit: Percentage.
- Despite being nonsensical in this context, the intercept is still mathematically necessary for the model and useful for calculations within the observed data range.
Slope ($oldsymbol{\beta_hat} = 0.02 ext{ percentage points}$):
- Interpretation: For every one-point increase in a student's midterm exam grade, the predicted probability of them earning an 'A' or 'A-' in the class increases by $2$ percentage points, on average.
- Unit: Percentage points.
- Practical example: Scoring one point better on the midterm makes a student 2 ext{%} more likely to get an 'A'. Scoring one point lower makes them 2 ext{%} less likely.

Practical Predictions for Binary Outcomes

Predicting Probability of 'A' for a Midterm of $80$ :
- $ext{Predicted Grade A} = -1.34 + (0.02 imes 80) = -1.34 + 1.6 = 0.26$
- Interpretation: A student who scored $80$ on the midterm has a 26 ext{%} probability of earning an 'A' or 'A-' in the class. (Roughly $1$ in $4$ chance).
Predicting Probability of 'A' for a Midterm of $90$ :
- $ext{Predicted Grade A} = -1.34 + (0.02 imes 90) = -1.34 + 1.8 = 0.46$
- Interpretation: A student who scored $90$ on the midterm has a 46 ext{%} probability of earning an 'A' or 'A-' in the class. (Roughly $1$ in $2$ chance).
Predicting Change: An increase in midterm score of $10$ points ( $90 - 80$ ) is associated with an increase in the predicted probability of earning an 'A' or 'A-' by $20$ percentage points on average ( $0.02 imes 10 = 0.20$ ).
- This is a substantial increase, but it's important to note that the midterm is only one factor contributing to a final 'A' grade.

R-squared ($oldsymbol{r^2}$) for Model Fit

Definition: R-squared is a measure of how well the regression model fits the observed data. It represents the proportion of the variance in the dependent variable (y) that is predictable from the independent variable (x).
Not related to R software: The name is similar but it's a statistical concept, not tied to the R programming language.
Formula: For simple linear regression, $r^2 = ( ext{correlation})^2$ .
Interpretation of Values:
- $r^2 = 1$ : Indicates a perfect fit. The model explains 100 ext{%} of the variation in y. There is no error between predicted y and actual y (e.g., when correlation = $1$ or $-1$ ).
- $r^2 = 0$ : Indicates no fit. The model explains 0 ext{%} of the variation in y. There is a large mismatch between predicted y and actual y (e.g., when correlation = $0$ ).
Calculating $r^2$ for the 'Grade A' Model:
- Correlation was $0.64$ .
- $r^2 = (0.64)^2 hickapprox 0.4096 hickapprox 0.41$
- Interpretation: The model using midterm grades explains about 41 ext{%} of the variation in whether a student gets an 'A' or 'A-'. This means midterm grades are a significant predictor, but they do not explain the majority of the variation, implying other factors are also very important.
Comparing $r^2$ with Non-Binary Outcome Model:
- The previous model (midterm predicting final exam grade) had an $r^2 hickapprox 0.51$ .
- Important Caveat: You should only compare R-squared values to other models using the same outcome variable. Different outcomes have different inherent levels of predictability, so comparing $r^2$ across different outcome types can be misleading.
Missing Data: When calculating correlations or $r^2$ with missing data, specify na.rm = TRUE (or similar option) to exclude missing values from the calculation without permanently removing them from the dataset.

Conclusion

For predicting outcomes using linear models, aim for predictor (x) variables that are highly correlated with the outcome (y).
A higher correlation leads to a higher $r^2$ , indicating a better model fit and more reliable predictions from a simple linear model.
Next lecture will focus on causal effects using observational studies.

9/24: SOCI 252 - Linear Regression with Binary Outcomes

Review of Previous Lecture

Predicting Binary Outcomes: Earning an 'A'

Units of Measurement for Binary Outcomes (Crucial Distinction)

Scatterplot for Binary Outcomes

Correlation (rrr)

Fitting the Linear Model

Interpreting Coefficients for Binary Outcomes

Practical Predictions for Binary Outcomes

R-squared ($oldsymbol{r^2}$) for Model Fit

Conclusion

Correlation ( $r$ )

R-squared ($oldsymbol{r^2}$) for Model Fit