Categorical Predictions Explanation
Categorical Predictions in Regression Analyses
Incorporating Categorical Variables
- Traditionally, regression analyses focus on continuous variables for both predictors and outcomes.
- It's common to include categorical variables in regression, but it requires careful handling.
- Assigning numbers to categories doesn't imply mathematical meaning (e.g., 1 > 0). Numeric assignments are arbitrary.
- Proper coding allows comparisons among categories.
- Categorical variables can control for factors within a regression analysis.
Dummy Coding
- Categorical variables are coded using 0s and 1s (dummy coding).
- Typically, 1 indicates the presence of an attribute.
- For a two-category variable (e.g., gender: male/female), one category gets 1, the other gets 0.
- The choice of which category gets 1 or 0 is arbitrary but impacts interpretation.
Example: Aggression and Gender
- Revisit aggression example from hierarchical regression lecture, including gender.
- Prior research shows males tend to be more aggressive than females.
- Gender can be a background factor to control for in regression.
- Assess video game violence exposure effects on aggression after controlling for gender.
- Coding: male = 1, female = 0.
- The zero category (female) serves as the referent.
- The goal is to evaluate how much more aggressive males are compared to females.
- Slope interpretation: for each one-unit change in x (female to male), observe the change in y (aggression).
- Models are statistically significant when including gender, accounting for more variability in aggressive behavior.
- Statistically meaningful change in R^2. Background factors account for variability.
Interpreting Regression Weights
- With gender in the model, a one-unit increase in gender (female to male) leads to a 0.738 increase in aggressive behavior.
- Beta weight can be more informative here.
- Being male is associated with a 0.258 standard deviation increase in aggressive behavior.
- Video game violence remains a statistically meaningful predictor after controlling for gender.
Dummy Coding with Multiple Categories
- Dummy coding extends to variables with more than two categories.
- Need to choose a baseline (reference) category for comparison.
- k-1 dummy variables are needed, where k is the number of categories.
- Members of the control category receive zeros for all dummy variables.
- For other categories, members get a one in their respective dummy variable, zero otherwise.
- The set of dummy variables is entered together in the same block of the regression analysis.
Example: Party Affiliation and Tax Fairness
- Data from Pew Research Center (March 2019 political survey).
- Categories: Republican, Democrat, Independent.
- Dependent variable: perceived fairness of the federal tax system (higher scores = greater fairness).
- Dummy variable creation in SPSS:
- A variable represents party affiliation.
- Create dummy variables from this.
- With three categories, need 3-1 = 2 dummy variables.
- Dummy variables labeled Republican and Independent, with Democrat as the reference.
- Republican: 1 for Republicans, 0 for others.
- Independent: 1 for Independents, 0 for others.
- Democrats are the implicit reference category.
Regression Analysis with Dummy Variables
- Enter all dummy variables (Republican, Independent) into the regression in the same block.
- Each category is compared against Democrats.
- Republican variable controls for Independence; Independent variable controls for Republicans.
- Democrat becomes the reference category for both.
- The intercept represents baseline fairness perception and dummy variables show shifts from this.
Interpreting Coefficients
- Focus on regression coefficients to understand the effect of each category relative to the reference.
- Republicans perceive the tax system to be more fair than Democrats (0.675 increase in support).
- Independents also perceive the tax system to be more fair than Democrats (0.2 increase in support).
- Dummy coding is useful for including categorical variables in regression.
Dummy Variables and Intercept
- When only dummy variables are in the model, the intercept equals the mean of the reference group.
- The mean of each category is the intercept plus the unstandardized regression weight.
- Reasoning: The intercept term is the value of y when x is zero. When x is zero for Republicans and Independents, you are left with Democrats.
- Example: If the intercept is 2.010 (mean for Democrats) and the regression coefficient for Republicans is 0.675, then 2.010 + 0.675 = 2.685, which is the mean for Republicans.
- The regression coefficients represent the amount of change when that variable moves from a value of zero to one.
Conclusion
- Including a categorical variable into your data and analyzing it is a useful tool. The method can be used to either control for it or test hypotheses specifically related to that variable.
- Dummy variables serve as a bridge to analysis of variance.