Box Plots, IQR, and Standard Deviation — Quick Reference

Box plots and distribution comparisons

  • Box plots visualize distributions and enable quick comparisons across groups (e.g., calories for hot dogs; poultry generally lower than beef and meat; some poultry brands higher than beef/meat).

  • Including the minimum and maximum on box plots highlights the full range and potential outliers.

Skewness and the five-number summary

  • For skewed distributions, the two halves differ; summarize spread with the five-number summary: min, Q1, median, Q3, max.

  • For symmetric distributions, a single spread measure can be informative.

Interquartile range (IQR)

  • For approximately symmetric distributions, the spread can be described by IQR=Q<em>3Q</em>1\text{IQR} = Q<em>3 - Q</em>1.

  • On a box plot, the IQR is the length of the box (visual cue for spread).

Standard deviation and computation

  • Steps to compute the standard deviation:

    • Mean: μ=1n<em>i=1nx</em>i\mu = \frac{1}{n} \sum<em>{i=1}^n x</em>i

    • Deviation of each value: xiμx_i - \mu

    • Variance: σ2=1n<em>i=1n(x</em>iμ)2\sigma^2 = \frac{1}{n} \sum<em>{i=1}^n (x</em>i - \mu)^2

    • Standard deviation: σ=σ2=1n<em>i=1n(x</em>iμ)2\sigma = \sqrt{\sigma^2} = \sqrt{ \frac{1}{n} \sum<em>{i=1}^n (x</em>i - \mu)^2 }

  • A smaller σ\sigma indicates tighter clustering around the mean.

  • Use the standard deviation to compare the spread of different distributions.

Music-based metabolic profiling analogy

  • Biochemistry researchers convert metabolic values into musical notes to supplement visual data; some notes are very similar across individuals, others vary widely.

  • The normal range can be inferred using the standard deviation.

From pictures to numbers

  • This program shows the progression from distribution visuals to numeric descriptions: mean, median, quartiles, and standard deviation.

  • Calculators/computers ease computation; focus on pictures first to identify which numbers matter.