Chapter 3 Book Notes

Histograms

  • Display distribution of quantitative data using bins; counts per bin form the distribution.
  • Histogram bars show counts; relative frequency histogram uses percentages and is faithful to the area principle.
  • Calculator can generate histograms; adjust bin width in Window settings to see different presentations.

Stem-and-Leaf Displays

  • Show distribution while preserving individual values; contains all information of a histogram.
  • Satisfy area principle when carefully drawn; useful for quick review of distribution shape.

Dotplots

  • Simple display: one dot per case along an axis (horizontal or vertical).
  • Easy to see individual values and gaps/outliers.

When to Use Each Display

  • Data must be quantitative (Quantitative Data Condition).
  • Choose display that best reveals shape, center, spread, and any unusual features.

Shape, Center, and Spread

  • When describing a distribution, report shape, center, and spread.
  • Shape: unimodal, bimodal, multimodal; symmetry vs skewness; look for gaps or outliers.
  • Center and spread: pair measures (mean with standard deviation; median with IQR).

Shape: Humps and Symmetry

  • Humps = modes
  • Unimodal: one main peak; bimodal: two peaks; multimodal: three or more peaks.
  • Uniform distribution: bars about equal height (e.g., fair die).
  • Symmetry: fold along vertical line; tails determine skewness.
  • Skewed left: longer left tail; skewed right: longer right tail.

Unusual Features

  • Note outliers and gaps; may indicate multiple groups or data collection issues.
  • Outliers can be far from body of distribution; may be reported with special symbols in boxplots.

Center: The Median vs The Mean

  • Median: middle value (ordered data); resistant to outliers.
  • Mean: balance point of histogram; sensitive to outliers.
  • When to use:
    • Symmetric, no outliers: mean and standard deviation.
    • Skewed or outliers: median and IQR.
  • Formulas:
    • Mean (sample): \bar{x} = \frac{1}{n}\sum{i=1}^n xi

Spread: Range, IQR, and Standard Deviation

  • Range: max − min; highly sensitive to outliers.
  • Interquartile Range (IQR): middle 50% of data; robust to outliers.
    • IQR = Q3 − Q1; Q1 and Q3 are 25th and 75th percentiles.
  • Standard Deviation (s): average distance from the mean; sensitive to outliers.
    • Variance: s^2 = \frac{1}{n-1}\sum{i=1}^n (xi - \bar{x})^2
    • Standard deviation: s = \sqrt{s^2}

5-Number Summary

  • Summary consisting of: minimum, Q1, median, Q3, maximum.
  • Often shown as the box in a boxplot: min, Q1, median, Q3, max.
  • Example: 5-number summary for data: (min, Q1, median, Q3, max).

Boxplots

  • Graphical display of the five-number summary.
  • Useful for comparing groups and spotting outliers.
  • Construction:
    • Draw axis; box from Q1 to Q3 with a line at the median.
    • Fences: upper = Q3 + 1.5 × IQR; lower = Q1 − 1.5 × IQR.
    • Whiskers extend to most extreme data within fences; outliers plotted separately.
    • Far outliers > 3 × IQR from quartiles plotted with special symbols.

Boxplots: Example

  • Compare histogram and boxplot for a data set (e.g., wind speeds) to see distribution representation differences.

What to Tell About a Quantitative Variable

  • Start with a display (histogram, stem-and-leaf, or dotplot) and describe the shape.
  • Report center and spread: pair median with IQR; mean with standard deviation.
  • If skewed, report median and IQR (and discuss mean vs median differences).

What Can Go Wrong?

  • Do not use histograms for categorical data; use bar charts or pie charts instead.
  • Do not rely on bars for all displays; reserve bars for histograms/bar charts.
  • Choose bin width carefully; changing bin width alters appearance.
  • Always do a reality check and sort data before computing median/percentiles.
  • Do not report too many decimal places; avoid rounding in calculations midstream.
  • Watch for multiple modes and outliers; make a picture to verify.

What Have We Learned? (Key Takeaways)

  • Data must be quantitative (Quantitative Data Condition) with known units.
  • Median vs mean: median halves data; mean balances the histogram.
  • Spread measures: IQR and standard deviation; IQR resists outliers, SD does not.
  • In skewed distributions, report median and IQR; in symmetric distributions, report mean and SD.
  • Use pictures to tell the data story: histograms, stem-and-leaf, dotplots, or boxplots.
  • 5-number summary: min, Q1, median, Q3, max; pair with mean/SD or median/IQR as appropriate.

AP Tips for Summaries and Graphs

  • Include scales and labels on all graphs.
  • Describe center, shape, and spread; be specific about outliers or gaps.
  • Be aware of whether you have full data or only summary statistics.
  • Use calculator to obtain and present summary statistics efficiently.
  • Sometimes you’ll work with given summaries rather than full data; adjust approach accordingly.