Chapter 3 Book Notes
Histograms
- Display distribution of quantitative data using bins; counts per bin form the distribution.
- Histogram bars show counts; relative frequency histogram uses percentages and is faithful to the area principle.
- Calculator can generate histograms; adjust bin width in Window settings to see different presentations.
Stem-and-Leaf Displays
- Show distribution while preserving individual values; contains all information of a histogram.
- Satisfy area principle when carefully drawn; useful for quick review of distribution shape.
Dotplots
- Simple display: one dot per case along an axis (horizontal or vertical).
- Easy to see individual values and gaps/outliers.
When to Use Each Display
- Data must be quantitative (Quantitative Data Condition).
- Choose display that best reveals shape, center, spread, and any unusual features.
Shape, Center, and Spread
- When describing a distribution, report shape, center, and spread.
- Shape: unimodal, bimodal, multimodal; symmetry vs skewness; look for gaps or outliers.
- Center and spread: pair measures (mean with standard deviation; median with IQR).
Shape: Humps and Symmetry
- Humps = modes
- Unimodal: one main peak; bimodal: two peaks; multimodal: three or more peaks.
- Uniform distribution: bars about equal height (e.g., fair die).
- Symmetry: fold along vertical line; tails determine skewness.
- Skewed left: longer left tail; skewed right: longer right tail.
Unusual Features
- Note outliers and gaps; may indicate multiple groups or data collection issues.
- Outliers can be far from body of distribution; may be reported with special symbols in boxplots.
- Median: middle value (ordered data); resistant to outliers.
- Mean: balance point of histogram; sensitive to outliers.
- When to use:
- Symmetric, no outliers: mean and standard deviation.
- Skewed or outliers: median and IQR.
- Formulas:
- Mean (sample): \bar{x} = \frac{1}{n}\sum{i=1}^n xi
Spread: Range, IQR, and Standard Deviation
- Range: max − min; highly sensitive to outliers.
- Interquartile Range (IQR): middle 50% of data; robust to outliers.
- IQR = Q3 − Q1; Q1 and Q3 are 25th and 75th percentiles.
- Standard Deviation (s): average distance from the mean; sensitive to outliers.
- Variance: s^2 = \frac{1}{n-1}\sum{i=1}^n (xi - \bar{x})^2
- Standard deviation: s = \sqrt{s^2}
5-Number Summary
- Summary consisting of: minimum, Q1, median, Q3, maximum.
- Often shown as the box in a boxplot: min, Q1, median, Q3, max.
- Example: 5-number summary for data: (min, Q1, median, Q3, max).
Boxplots
- Graphical display of the five-number summary.
- Useful for comparing groups and spotting outliers.
- Construction:
- Draw axis; box from Q1 to Q3 with a line at the median.
- Fences: upper = Q3 + 1.5 × IQR; lower = Q1 − 1.5 × IQR.
- Whiskers extend to most extreme data within fences; outliers plotted separately.
- Far outliers > 3 × IQR from quartiles plotted with special symbols.
Boxplots: Example
- Compare histogram and boxplot for a data set (e.g., wind speeds) to see distribution representation differences.
What to Tell About a Quantitative Variable
- Start with a display (histogram, stem-and-leaf, or dotplot) and describe the shape.
- Report center and spread: pair median with IQR; mean with standard deviation.
- If skewed, report median and IQR (and discuss mean vs median differences).
What Can Go Wrong?
- Do not use histograms for categorical data; use bar charts or pie charts instead.
- Do not rely on bars for all displays; reserve bars for histograms/bar charts.
- Choose bin width carefully; changing bin width alters appearance.
- Always do a reality check and sort data before computing median/percentiles.
- Do not report too many decimal places; avoid rounding in calculations midstream.
- Watch for multiple modes and outliers; make a picture to verify.
What Have We Learned? (Key Takeaways)
- Data must be quantitative (Quantitative Data Condition) with known units.
- Median vs mean: median halves data; mean balances the histogram.
- Spread measures: IQR and standard deviation; IQR resists outliers, SD does not.
- In skewed distributions, report median and IQR; in symmetric distributions, report mean and SD.
- Use pictures to tell the data story: histograms, stem-and-leaf, dotplots, or boxplots.
- 5-number summary: min, Q1, median, Q3, max; pair with mean/SD or median/IQR as appropriate.
AP Tips for Summaries and Graphs
- Include scales and labels on all graphs.
- Describe center, shape, and spread; be specific about outliers or gaps.
- Be aware of whether you have full data or only summary statistics.
- Use calculator to obtain and present summary statistics efficiently.
- Sometimes you’ll work with given summaries rather than full data; adjust approach accordingly.