PSCH 443 Multiple Regression 5 Dummy Coding

Understanding Categorical Predictions in Regression

Introduction to Categorical Variables

  • Objective: Incorporating categorical variables into regression analyses while keeping the outcome variable continuous.

  • Importance of mindful coding of categorical predictors to avoid misinterpretation of numerical values.

Coding Categorical Variables

  • Assign numeric values (0 or 1) to represent categories.

    • Example: Gender coding (Male = 1, Female = 0).

    • The choice of coding does not affect results but aids interpretation.

Regression Analysis Example

  • Context: Aggression and gender as a background factor.

  • Research indicates males generally demonstrate higher levels of aggression.

  • Importance of controlling for gender in regression models.

    • Assess how video game violence affects aggression, controlling for gender.

Hierarchical Regression Steps

  • Enter gender variable (coded as Male = 1, Female = 0) in the first step of regression.

  • Gender (zero for female, one for male) as the referent category helps in evaluating differences in aggressive behavior between genders.

    • Allows examination of how being male affects aggression compared to females.

  • Results indicate significant variance in aggression explained by the inclusion of gender in the model.

Interpreting Regression Weights

  • Understanding slope as rise over run; one unit change from one category to another reflects a measurable impact on the outcome variable.

  • Hierarchical regression shows an increase of 0.738 in aggressive behavior moving from female to male.

  • Gender is statistically significant, linked to a 0.258 standard deviation increase in aggression.

Dummy Coding Explained

  • Dummy coding allows representation of categorical variables with more than two categories.

  • Requires k - 1 dummy variables for k categories.

    • Each category is represented with a binary indicator (1 for membership, 0 for non-membership).

Example of Party Affiliation Analysis

  • Using real data from Pew Research on political party affiliation and perceptions of federal tax system fairness.

  • Categories: Republican, Democrat, Independent.

  • Code two dummy variables for analysis: Independent (1 if member, 0 if not); Republican (1 if member, 0 if not); Democrat as the reference category.

Implementing Regression with Dummy Variables

  • Importance of entering all dummy variables simultaneously in regression analysis to avoid errors.

  • Democrats serve as the implicit reference category against which other categories are evaluated.

  • Regression weights indicate the perceived fairness of the tax system adjusted for party affiliation.

Interpretation of Results

  • For Republicans: A 0.675 increase in perceived fairness relative to Democrats.

  • For Independents: A 0.2 increase in perceived fairness relative to Democrats.

  • Analysis shows both Republicans and Independents perceive the tax system as fairer than Democrats.

Implications for Analysis

  • Dummy coding allows effective control and evaluation of categorical variables in regression.

  • Understanding the reference category's role is crucial for result interpretation.

  • Intercept term in a regression model with only dummy variables equals the mean of the reference group.

  • Provides insights into average outcomes across groups by incorporating categorical variables into regression analysis.

Conclusion

  • Including categorical variables in regression analysis is useful for controlling variables or testing specific hypotheses.

  • Bridges understanding to future discussions on analysis of variance (ANOVA) in the course.

robot