Bivariate Graphics
General Information
Instructor: Herong Wang, MPH; Troy Zirui Zhou, MPH
Date of Lecture: November 14, 2025
Announcements
Drop-in Hours: Attendance recommended for student drop-in hours.
Discussion Posts: Reminder to participate in at least 6 different discussion posts (only 2 left).
Homework Updates:
Homework 9 regrade completed.
Homework 11 (univariate descriptives) was due this week.
Homework 12 (bivariate graphics) is due next week.
Regular deadline for homework and discussions is Friday, November 21.
Outline of Today's Lecture
Slides: Bivariate plot types
Coding Together: Bivariate plot types
Slides: Bivariate plot: Scatterplot spotlight
Coding Together: Scatterplot spotlight
Break
Slides: Multipaneled plots & exporting plots
Coding Together: Multipaneled plots & exporting plots
Slides: Bivariate descriptives (likely discussion for next class)
Learning Objectives for Bivariate Plot Types
Visualizing Relationships:
Prioritize plot types based on data types (bar chart, histogram, boxplot, scatterplot)
Description of coding features:
Labels
Limits
Colors
Legends
Size
Transparency
Framework for Deciding Plot Type
Data Type and Variable Count
Factors vs. Numeric Variables:
One Variable Setup:
Factor: Use a bar chart
Numeric: Use histogram / density plot
Two Variable Setup:
If the 2nd variable is numeric: use scatter plot or line plot
If the 2nd variable is a factor: use a boxplot, violin plot, or beeswarm plot
Aesthetic Mapping in ggplot2
Aes() Function Mappings:
x= x-axis variabley= y-axis variablefill= fill color variablecolor= line color variableshape= shape of pointssize= size of pointsalpha= transparency of points
Bar Plot for Two Factor Variables
Code Example:
ggplot(nhanes, aes(y = sex, fill = education)) + geom_bar() + labs(y = "Sex", x = "Number of Participants")
Considerations for Interpretation:
Analyze distributions and relationships between variables.
Density Plot for Numeric and Factor Variable
Code Example:
ggplot(nhanes, aes(x = LBXIRN, color = sex)) + geom_density(linewidth = 1.5)
Limitations: Useful only when factor variable has few categories.
Histogram for Numeric and Factor Variable
Code Example:
ggplot(nhanes, aes(x = LBXIRN, fill = sex)) + geom_histogram(binwidth = 10, position = "identity", alpha = 0.5)
Transparency and Positioning: Understand the effect of
alphafor overlapping data.
Changing Aesthetic Scales
Adjusting scales for fill/color:
Use
scale_color_*functions orscale_fill_*functions to specify color scales.Example:
scale_color_manual(values = c("firebrickred", "forestgreen", "dodgerblue"))
scale_color_brewer(palette = "")
scale_color_viridis_d()
Ensure the number of color values matches levels in your aesthetic variable.
Boxplot for Numeric and Factor Variables
Code Example:
ggplot(nhanes, aes(x = age_groups, y = LBXIRN)) + geom_boxplot(fill = "darkorchid")
Utility: Effective across various categories of the factor variable.
Violin Plot for Numeric and Factor Variables
Code Example:
ggplot(nhanes, aes(x = age_groups, y = LBXIRN)) + geom_violin(draw_quantiles = c(0.25, 0.5, 0.75), fill = "darkorchid")
Application: Useful when the factor variable has various categories.
Adding Beeswarm Points to Box or Violin Plot
Code Example:
ggplot(nhanes, aes(x = age_groups, y = LBXIRN)) + geom_violin(fill = "darkorchid") + geom_jitter(shape = 16, alpha = 0.25, position = position_jitter(0.2), size = 0.75)
Scatterplot Spotlight
Scatterplots are primarily used for visualizing relationships between two numeric variables.
Example Application
A basic scatterplot showing the relationship between iron and red blood cell count.
ggplot(nhanes, aes(x = LBXIRN, y = LBXRBCSI)) + geom_point(alpha = 0.2, size = 0.4, shape = 2)
Interpretation Questions: What variables are plotted, and how are they aesthetically represented?
Optional Features for Enhanced Scatterplots
Customizing
geom_point():Size:
size =Shape:
shape =Colors:
color =for outline andfill =for point colorTransparency:
alpha =
geom_point(size = 3, shape = 22, color = "seagreen", fill = "violet", alpha = 1/3)
Adding a Third Variable to Scatterplots
Utilizing
colorfor a third variable:
ggplot(nhanes, aes(x = LBXIRN, y = LBXRBCSI, color = sex)) + geom_point()
Important variables to consider in terms of aesthetics.
Modifying Axis Limits
To change the zoom:
xlim(0, 300) + ylim(4, 6)
Line Plot for Two Numeric Variables
Code Example:
ggplot(nhanes, aes(x = LBXIRN, y = LBXRBCSI)) + geom_smooth(method = lm, se = T, fill = "red")
Interpret the relationship depicted by the plot.
Additional geom_ Functions
Reference: https://ggplot2.tidyverse.org/reference/
Highlights the flexibility based on specific analysis needs.
Recap on Bivariate Plot Types
Plot type selection based on:
Numeric vs. Factor Variables
Two Numeric Variables: Scatterplot or line plot using
geom_smooth()Adding Third Variables: Achievable via aesthetic features.
Multipaneled Plots and Their Use
Faceting for Comparisons: Define categories in separate plots while maintaining common axis style
Faceting Example:
ggplot(nhanes, aes(x = age, fill = education)) + geom_histogram() + facet_wrap(vars(education), ncol = 1, nrow = 4)
Parameters define the variable to facet and the number of columns/rows utilized.
Plot Objects and Layering
Assign plots as objects for further analysis:
scatter_plot <- ggplot(nhanes, aes(x = LBXIRN, y = LBXRBCSI)) + geom_point()
str(scatter_plot)
Layering Example:
plot_object + geom_smooth()to add a smoothing layer
Combine diverse plots into one multipaneled output.
Exporting and Saving Plots
Quick Export Options:
Less control over output dimensions, resolution, and format; e.g., right-click and copy.
Code-Controlled Exporting:
Use
ggsave()to specify dimensions and formats:
ggsave(filename = "histogram_by_sex.pdf", plot = histogram_by_sex, path = here("Class"))
Recap on Multipaneled Plots
Plots as objects, faceting strategies for visual outputs, and combining plots efficiently with packages such as patchwork.
Learning Objectives for Bivariate Descriptive Statistics
Calculate appropriate descriptive statistics based on varying data types.
Bivariate Descriptive Statistics Overview
These statistics look at distributions from two variables simultaneously.
Example Comparisons:
Comparing ages between cases and controls, analyzing grade distributions across courses, or time since mammograms based on insurance status.
Use filtering datasets to compute univariate descriptive statistics based on initial conditions.
Filtering Datasets with Conditions
Filter Example: Extracting participants matching criteria:
cases <- nhanes %>% filter(disease == "case")
Results can include subsets of cases or controls.
Calculating Bivariate Descriptive Statistics
Code Example with summarise():
nhanes %>% summarise(Minimum = min(age), Mean = mean(age), Maximum = max(age), .by = disease)
Output provides minimums, means, and maximums trying to show comparison across factor variables.
Handling Two Factor Variables
Count and Percentage in each group:
Using either
table()orcount()to create cross-tables.Example visualization might reflect educational attainment against disease status.
Summary on Bivariate Descriptive Statistics
Final Notes:
Descriptive statistics provide insights into distributions of two variable interactions.
Essential functions for processing include
filter()for row selection andsummarise()for descriptive details.Comprehensive understanding requires differentiating between one numeric with a factor variable versus two factor variables in output presentation.
Upcoming Classes
Future classes to include more on creating professional, reproducible statistics tables for export.
Link to lab materials available on RStudio cloud: https://posit.cloud/spaces/680655.