Useful for cross-group comparisons and spotting outliers.
Mild to moderate skew can still be represented; extreme skew may favor median and IQR.
ggplot2 and the Grammar of Graphics: Essentials
ggplot2 uses the grammar of graphics: data, aesthetics, and geometry (layers).
Aesthetic mappings (aes) connect data variables to plot properties (x, y, color, fill, etc.).
Geometries (geoms) determine the type of plot (points, lines, bars, histograms).
Plot design: start simple, aim for clarity, and tailor to the story you want to tell.
For comparisons across groups, consider faceting, color encoding, and consistent axes.
Plot Design Principles and Practical Tips
Simple and effective plots often beat fancy but opaque visuals.
Choose the plot type based on the question and the data: one numerical variable -> histogram; two numerical -> scatter; one numerical + one categorical -> box plots or faceted histograms; categorical -> bar plots.
When presenting to others, include clear labels, titles, and legends only as needed to tell the story.
Exploratory Data Analysis (EDA) emphasizes quickly exploring data with simple plots; explanatory plots communicate a specific message.
Practice with real datasets; use code from sources as a starting point and tailor to your data.
Practical Notes for KC3 and Homework Prep
You may be provided with summary statistics; you can compute z-scores by hand or via code using those statistics.
For a quick histogram-like check of a distribution, focus on the shape, symmetry, and presence of outliers.
When comparing distributions across groups, consider using IQR/median or density-scaled histograms to avoid confounding by group size.
Remember the big five plotting questions: how many variables, are they numerical or categorical, and what story are you trying to tell?
Quick Reference: Key Formulas
Z-score: z=σx−μ
Interquartile range: IQR=Q<em>3−Q</em>1
Five-number summary: min, Q1, median (Q2), Q3, max
If needed: SD formula: s=n−11∑<em>i=1n(x</em>i−xˉ)2