Bivariate Graphics

General Information

  • Instructor: Herong Wang, MPH; Troy Zirui Zhou, MPH

  • Date of Lecture: November 14, 2025

Announcements

  • Drop-in Hours: Attendance recommended for student drop-in hours.

  • Discussion Posts: Reminder to participate in at least 6 different discussion posts (only 2 left).

  • Homework Updates:

    • Homework 9 regrade completed.

    • Homework 11 (univariate descriptives) was due this week.

    • Homework 12 (bivariate graphics) is due next week.

    • Regular deadline for homework and discussions is Friday, November 21.

Outline of Today's Lecture

  • Slides: Bivariate plot types

  • Coding Together: Bivariate plot types

  • Slides: Bivariate plot: Scatterplot spotlight

  • Coding Together: Scatterplot spotlight

  • Break

  • Slides: Multipaneled plots & exporting plots

  • Coding Together: Multipaneled plots & exporting plots

  • Slides: Bivariate descriptives (likely discussion for next class)

Learning Objectives for Bivariate Plot Types

  • Visualizing Relationships:

    • Prioritize plot types based on data types (bar chart, histogram, boxplot, scatterplot)

    • Description of coding features:

    • Labels

    • Limits

    • Colors

    • Legends

    • Size

    • Transparency

Framework for Deciding Plot Type
Data Type and Variable Count

  • Factors vs. Numeric Variables:

    • One Variable Setup:

    • Factor: Use a bar chart

    • Numeric: Use histogram / density plot

    • Two Variable Setup:

    • If the 2nd variable is numeric: use scatter plot or line plot

    • If the 2nd variable is a factor: use a boxplot, violin plot, or beeswarm plot

Aesthetic Mapping in ggplot2

  • Aes() Function Mappings:

    • x = x-axis variable

    • y = y-axis variable

    • fill = fill color variable

    • color = line color variable

    • shape = shape of points

    • size = size of points

    • alpha = transparency of points

Bar Plot for Two Factor Variables

  • Code Example:

ggplot(nhanes, aes(y = sex, fill = education)) + geom_bar() + labs(y = "Sex", x = "Number of Participants")
  • Considerations for Interpretation:

    • Analyze distributions and relationships between variables.

Density Plot for Numeric and Factor Variable

  • Code Example:

ggplot(nhanes, aes(x = LBXIRN, color = sex)) + geom_density(linewidth = 1.5)
  • Limitations: Useful only when factor variable has few categories.

Histogram for Numeric and Factor Variable

  • Code Example:

ggplot(nhanes, aes(x = LBXIRN, fill = sex)) + geom_histogram(binwidth = 10, position = "identity", alpha = 0.5)
  • Transparency and Positioning: Understand the effect of alpha for overlapping data.

Changing Aesthetic Scales

  • Adjusting scales for fill/color:

    • Use scale_color_* functions or scale_fill_* functions to specify color scales.

    • Example:

scale_color_manual(values = c("firebrickred", "forestgreen", "dodgerblue"))
scale_color_brewer(palette = "")
scale_color_viridis_d()
  • Ensure the number of color values matches levels in your aesthetic variable.

Boxplot for Numeric and Factor Variables

  • Code Example:

ggplot(nhanes, aes(x = age_groups, y = LBXIRN)) + geom_boxplot(fill = "darkorchid")
  • Utility: Effective across various categories of the factor variable.

Violin Plot for Numeric and Factor Variables

  • Code Example:

ggplot(nhanes, aes(x = age_groups, y = LBXIRN)) + geom_violin(draw_quantiles = c(0.25, 0.5, 0.75), fill = "darkorchid")
  • Application: Useful when the factor variable has various categories.

Adding Beeswarm Points to Box or Violin Plot

  • Code Example:

ggplot(nhanes, aes(x = age_groups, y = LBXIRN)) + geom_violin(fill = "darkorchid") + geom_jitter(shape = 16, alpha = 0.25, position = position_jitter(0.2), size = 0.75)
Scatterplot Spotlight

  • Scatterplots are primarily used for visualizing relationships between two numeric variables.

Example Application

  • A basic scatterplot showing the relationship between iron and red blood cell count.

ggplot(nhanes, aes(x = LBXIRN, y = LBXRBCSI)) + geom_point(alpha = 0.2, size = 0.4, shape = 2)
  • Interpretation Questions: What variables are plotted, and how are they aesthetically represented?

Optional Features for Enhanced Scatterplots

  • Customizing geom_point():

    • Size: size =

    • Shape: shape =

    • Colors: color = for outline and fill = for point color

    • Transparency: alpha =

geom_point(size = 3, shape = 22, color = "seagreen", fill = "violet", alpha = 1/3)
Adding a Third Variable to Scatterplots

  • Utilizing color for a third variable:

ggplot(nhanes, aes(x = LBXIRN, y = LBXRBCSI, color = sex)) + geom_point()
  • Important variables to consider in terms of aesthetics.

Modifying Axis Limits

  • To change the zoom:

xlim(0, 300) + ylim(4, 6)
Line Plot for Two Numeric Variables

  • Code Example:

ggplot(nhanes, aes(x = LBXIRN, y = LBXRBCSI)) + geom_smooth(method = lm, se = T, fill = "red")
  • Interpret the relationship depicted by the plot.

Additional geom_ Functions

  • Reference: https://ggplot2.tidyverse.org/reference/

  • Highlights the flexibility based on specific analysis needs.

Recap on Bivariate Plot Types

  • Plot type selection based on:

    • Numeric vs. Factor Variables

    • Two Numeric Variables: Scatterplot or line plot using geom_smooth()

    • Adding Third Variables: Achievable via aesthetic features.

Multipaneled Plots and Their Use

  • Faceting for Comparisons: Define categories in separate plots while maintaining common axis style

  • Faceting Example:

ggplot(nhanes, aes(x = age, fill = education)) + geom_histogram() + facet_wrap(vars(education), ncol = 1, nrow = 4)
  • Parameters define the variable to facet and the number of columns/rows utilized.

Plot Objects and Layering

  • Assign plots as objects for further analysis:

scatter_plot <- ggplot(nhanes, aes(x = LBXIRN, y = LBXRBCSI)) + geom_point()
str(scatter_plot)
  • Layering Example:

    • plot_object + geom_smooth() to add a smoothing layer

  • Combine diverse plots into one multipaneled output.

Exporting and Saving Plots

  • Quick Export Options:

    • Less control over output dimensions, resolution, and format; e.g., right-click and copy.

  • Code-Controlled Exporting:

    • Use ggsave() to specify dimensions and formats:

ggsave(filename = "histogram_by_sex.pdf", plot = histogram_by_sex, path = here("Class"))
Recap on Multipaneled Plots

  • Plots as objects, faceting strategies for visual outputs, and combining plots efficiently with packages such as patchwork.

Learning Objectives for Bivariate Descriptive Statistics

  • Calculate appropriate descriptive statistics based on varying data types.

Bivariate Descriptive Statistics Overview

  • These statistics look at distributions from two variables simultaneously.

  • Example Comparisons:

    • Comparing ages between cases and controls, analyzing grade distributions across courses, or time since mammograms based on insurance status.

  • Use filtering datasets to compute univariate descriptive statistics based on initial conditions.

Filtering Datasets with Conditions

  • Filter Example: Extracting participants matching criteria:

cases <- nhanes %>% filter(disease == "case")
  • Results can include subsets of cases or controls.

Calculating Bivariate Descriptive Statistics

  • Code Example with summarise():

nhanes %>% summarise(Minimum = min(age), Mean = mean(age), Maximum = max(age), .by = disease)
  • Output provides minimums, means, and maximums trying to show comparison across factor variables.

Handling Two Factor Variables

  • Count and Percentage in each group:

    • Using either table() or count() to create cross-tables.

    • Example visualization might reflect educational attainment against disease status.

Summary on Bivariate Descriptive Statistics

  • Final Notes:

    • Descriptive statistics provide insights into distributions of two variable interactions.

    • Essential functions for processing include filter() for row selection and summarise() for descriptive details.

    • Comprehensive understanding requires differentiating between one numeric with a factor variable versus two factor variables in output presentation.

Upcoming Classes

  • Future classes to include more on creating professional, reproducible statistics tables for export.

  • Link to lab materials available on RStudio cloud: https://posit.cloud/spaces/680655.