Categorical Predictions Explanation

Categorical Predictions in Regression Analyses

Incorporating Categorical Variables

  • Traditionally, regression analyses focus on continuous variables for both predictors and outcomes.
  • It's common to include categorical variables in regression, but it requires careful handling.
  • Assigning numbers to categories doesn't imply mathematical meaning (e.g., 1 > 0). Numeric assignments are arbitrary.
  • Proper coding allows comparisons among categories.
  • Categorical variables can control for factors within a regression analysis.

Dummy Coding

  • Categorical variables are coded using 0s and 1s (dummy coding).
  • Typically, 1 indicates the presence of an attribute.
  • For a two-category variable (e.g., gender: male/female), one category gets 1, the other gets 0.
  • The choice of which category gets 1 or 0 is arbitrary but impacts interpretation.

Example: Aggression and Gender

  • Revisit aggression example from hierarchical regression lecture, including gender.
  • Prior research shows males tend to be more aggressive than females.
  • Gender can be a background factor to control for in regression.
  • Assess video game violence exposure effects on aggression after controlling for gender.
  • Coding: male = 1, female = 0.
  • The zero category (female) serves as the referent.
  • The goal is to evaluate how much more aggressive males are compared to females.
  • Slope interpretation: for each one-unit change in x (female to male), observe the change in y (aggression).
  • Models are statistically significant when including gender, accounting for more variability in aggressive behavior.
  • Statistically meaningful change in R^2. Background factors account for variability.

Interpreting Regression Weights

  • With gender in the model, a one-unit increase in gender (female to male) leads to a 0.738 increase in aggressive behavior.
  • Beta weight can be more informative here.
  • Being male is associated with a 0.258 standard deviation increase in aggressive behavior.
  • Video game violence remains a statistically meaningful predictor after controlling for gender.

Dummy Coding with Multiple Categories

  • Dummy coding extends to variables with more than two categories.
  • Need to choose a baseline (reference) category for comparison.
  • k-1 dummy variables are needed, where k is the number of categories.
  • Members of the control category receive zeros for all dummy variables.
  • For other categories, members get a one in their respective dummy variable, zero otherwise.
  • The set of dummy variables is entered together in the same block of the regression analysis.

Example: Party Affiliation and Tax Fairness

  • Data from Pew Research Center (March 2019 political survey).
  • Categories: Republican, Democrat, Independent.
  • Dependent variable: perceived fairness of the federal tax system (higher scores = greater fairness).
  • Dummy variable creation in SPSS:
    • A variable represents party affiliation.
    • Create dummy variables from this.
  • With three categories, need 3-1 = 2 dummy variables.
  • Dummy variables labeled Republican and Independent, with Democrat as the reference.
  • Republican: 1 for Republicans, 0 for others.
  • Independent: 1 for Independents, 0 for others.
  • Democrats are the implicit reference category.

Regression Analysis with Dummy Variables

  • Enter all dummy variables (Republican, Independent) into the regression in the same block.
  • Each category is compared against Democrats.
  • Republican variable controls for Independence; Independent variable controls for Republicans.
  • Democrat becomes the reference category for both.
  • The intercept represents baseline fairness perception and dummy variables show shifts from this.

Interpreting Coefficients

  • Focus on regression coefficients to understand the effect of each category relative to the reference.
  • Republicans perceive the tax system to be more fair than Democrats (0.675 increase in support).
  • Independents also perceive the tax system to be more fair than Democrats (0.2 increase in support).
  • Dummy coding is useful for including categorical variables in regression.

Dummy Variables and Intercept

  • When only dummy variables are in the model, the intercept equals the mean of the reference group.
  • The mean of each category is the intercept plus the unstandardized regression weight.
  • Reasoning: The intercept term is the value of y when x is zero. When x is zero for Republicans and Independents, you are left with Democrats.
  • Example: If the intercept is 2.010 (mean for Democrats) and the regression coefficient for Republicans is 0.675, then 2.010 + 0.675 = 2.685, which is the mean for Republicans.
  • The regression coefficients represent the amount of change when that variable moves from a value of zero to one.

Conclusion

  • Including a categorical variable into your data and analyzing it is a useful tool. The method can be used to either control for it or test hypotheses specifically related to that variable.
  • Dummy variables serve as a bridge to analysis of variance.