Transforming Variables in R
Chapter 4 Transforming Variables
4.1 Get Ready
R Libraries: Load the following R libraries needed for this chapter:
descrHmisc
Data Files: Load the following data files:
states20.rdaanes20.rda
You may need to install packages if not previously done.
Useful Function: Use
cut2from theHmiscpackage to collapse numeric variables into binned categories.Working Directory: Ensure that datasets are in your working directory to avoid specifying file paths.
4.2 Introduction
Before exploring advanced graphing and statistical techniques, it is essential to transform the data into a more usable form in R.
Simplifying Names: Use intuitive variable names to make the dataset easy to work with.
Transformations: Will cover:
Renaming variables
Reordering categories
Combining variables
These transformations are vital for effective data analysis and will aid in future course assignments.
4.3 Data Transformations
Modifications to datasets are often necessary for effective analysis.
Modifications include adjusting single variables, multiple variables, or entire datasets.
Importance of Record Keeping:
Maintain script files with all commands used for transformations.
Script files serve as documentation for transformations for current and future projects.
4.4 Renaming and Relabeling
Changing Variable Names: Renaming cumbersome variables can simplify data analysis.
Example:
Original Variable Names:
V201314x(welfare programs),V201320x(aid to the poor)New Variable Names:
anes20$welfare_spndandanes20$poor_spnd.Copying variables to new objects with meaningful names enhances comprehension.
For instance:
anes20$welfare_spnd <- anes20$V201314xanes20$poor_spnd <- anes20$V201320xEnsure new variables remain within the
anes20dataset for analysis integrity.Verification: Always check variable frequencies to confirm correct transformations.
4.4.1 Changing Attributes
Variable Class: R classifies non-numeric variables as factors.
Ordered vs Unordered Factors: Some variables, like spending preference, are ordered:
Confirm with commands:
class(anes20$poor_spnd)andlevels(anes20$poor_spnd).
Changing a factor variable to ordered:
Use:
anes20$poor_spnd <- ordered(anes20$poor_spnd).Verify by checking the class and levels again.
Value Labels
Modify category labels when original labels are too long or unwieldy (as seen in graphs).
Use the
levels()function to change labels permanently:Original categories and new labels are defined for clarity in reporting and visualization.
Example:
levels(anes20$poor_spnd) <- c("Increase/Lot", "Increase", "Same", "Decrease", "Decrease/Lot").
Collapsing and Reordering Categories
Sometimes it is necessary to alter the number of categories within a variable.
Example with
anes20$poor_spnd: Collapse five categories into three:Create a new variable:
anes20$poor_spnd.3 <- (anes20$poor_spnd)Define new levels:
levels(anes20$poor_spnd.3) <- c("Increase", "Increase", "Keep Same", "Decrease", "Decrease").
Check Transformations: Use barplot to visualize category distributions post-transformation.
Reordering Categories
To ensure logical ordering of categories (e.g., from low to high), use the
orderedfunction:Example:
anes20$poor_spnd.3 <- ordered(anes20$poor_spnd.3, levels=c("Decrease", "Keep Same", "Increase")).
Combining Variables
Useful to combine separate variables into one for analysis.
For instance, calculating a net party feeling thermometer by subtracting Republican ratings from Democratic ratings:
anes20$netpty_ft <- (anes20$dempty_ft - anes20$reppty_ft).
Positive values indicate a preference for Democrats, negative for Republicans, and zero for neutrality.
Transforming to Categories
To categorize the net party feeling into three groups using
cut2:Example:
anes20$netpty_ft.3 <- ordered(cut2(anes20$netpty_ft, c(0, 1))).
Check Class: Ensure the new variable's class is correct after transformation.
Saving Changes
To maintain data integrity, always ensure that original datasets are not overwritten with transformations.
Recommended to save as a new file (e.g.,
anes20a) or overwrite only if new variables were added.Saving Command:
save(object, file="<FilePath>/filename").
4.9 Exercises
Create a new variable combining responses from various gun control measures to form an index.
Create a new ordinal variable based on a pre-existing ordinal scale (party identification) with fewer categories.
Construct a bar chart to visualize the differences between frequency tables of original and modified variables.