Traditionally, regression analyses focus on continuous variables for both predictors and outcomes.
It's common to include categorical variables in regression, but it requires careful handling.
Assigning numbers to categories doesn't imply mathematical meaning (e.g., 1 > 0). Numeric assignments are arbitrary.
Proper coding allows comparisons among categories.
Categorical variables can control for factors within a regression analysis.
Dummy Coding
Categorical variables are coded using 0s and 1s (dummy coding).
Typically, 1 indicates the presence of an attribute.
For a two-category variable (e.g., gender: male/female), one category gets 1, the other gets 0.
The choice of which category gets 1 or 0 is arbitrary but impacts interpretation.
Example: Aggression and Gender
Revisit aggression example from hierarchical regression lecture, including gender.
Prior research shows males tend to be more aggressive than females.
Gender can be a background factor to control for in regression.
Assess video game violence exposure effects on aggression after controlling for gender.
Coding: male = 1, female = 0.
The zero category (female) serves as the referent.
The goal is to evaluate how much more aggressive males are compared to females.
Slope interpretation: for each one-unit change in x (female to male), observe the change in y (aggression).
Models are statistically significant when including gender, accounting for more variability in aggressive behavior.
Statistically meaningful change in R^2. Background factors account for variability.
Interpreting Regression Weights
With gender in the model, a one-unit increase in gender (female to male) leads to a 0.738 increase in aggressive behavior.
Beta weight can be more informative here.
Being male is associated with a 0.258 standard deviation increase in aggressive behavior.
Video game violence remains a statistically meaningful predictor after controlling for gender.
Dummy Coding with Multiple Categories
Dummy coding extends to variables with more than two categories.
Need to choose a baseline (reference) category for comparison.
k-1 dummy variables are needed, where k is the number of categories.
Members of the control category receive zeros for all dummy variables.
For other categories, members get a one in their respective dummy variable, zero otherwise.
The set of dummy variables is entered together in the same block of the regression analysis.
Example: Party Affiliation and Tax Fairness
Data from Pew Research Center (March 2019 political survey).
Categories: Republican, Democrat, Independent.
Dependent variable: perceived fairness of the federal tax system (higher scores = greater fairness).
Dummy variable creation in SPSS:
A variable represents party affiliation.
Create dummy variables from this.
With three categories, need 3-1 = 2 dummy variables.
Dummy variables labeled Republican and Independent, with Democrat as the reference.
Republican: 1 for Republicans, 0 for others.
Independent: 1 for Independents, 0 for others.
Democrats are the implicit reference category.
Regression Analysis with Dummy Variables
Enter all dummy variables (Republican, Independent) into the regression in the same block.
Each category is compared against Democrats.
Republican variable controls for Independence; Independent variable controls for Republicans.
Democrat becomes the reference category for both.
The intercept represents baseline fairness perception and dummy variables show shifts from this.
Interpreting Coefficients
Focus on regression coefficients to understand the effect of each category relative to the reference.
Republicans perceive the tax system to be more fair than Democrats (0.675 increase in support).
Independents also perceive the tax system to be more fair than Democrats (0.2 increase in support).
Dummy coding is useful for including categorical variables in regression.
Dummy Variables and Intercept
When only dummy variables are in the model, the intercept equals the mean of the reference group.
The mean of each category is the intercept plus the unstandardized regression weight.
Reasoning: The intercept term is the value of y when x is zero. When x is zero for Republicans and Independents, you are left with Democrats.
Example: If the intercept is 2.010 (mean for Democrats) and the regression coefficient for Republicans is 0.675, then 2.010 + 0.675 = 2.685, which is the mean for Republicans.
The regression coefficients represent the amount of change when that variable moves from a value of zero to one.
Conclusion
Including a categorical variable into your data and analyzing it is a useful tool. The method can be used to either control for it or test hypotheses specifically related to that variable.
Dummy variables serve as a bridge to analysis of variance.