Density Curves
Reviewing Quantitative Data Exploration
- Any time you receive a quantitative data set, follow the classic 3-step exploration before considering a density curve.
- Graph the data
• Acceptable graphs: stem plot, boxplot, histogram, relative histogram, ogive, dot plot. - Look for overall pattern & shape
• Ask where the data cluster, whether it are symmetric, skewed, multi-modal, etc. - Compute numerical summaries (center & spread)
• If shape ≈ symmetric & “mountain-shaped,” use \bar x and s_x (mean & standard deviation).
• If shape is skewed or contains strong outliers, use \text{median} & \text{IQR}.
- Graph the data
- NEW optional Step 4 (today’s topic): if the pattern is so regular we can replace the jagged histogram with a smooth density curve.
Introducing Density Curves
- A smooth curve drawn over a histogram that approximates its overall shape.
- Useful when repeated samples show the same underlying pattern (regularity + consistency).
- Idea illustrated with 947 Iowa Test vocabulary scores: purple histogram (actual) vs red smooth curve (model).
• Despite minor over/under-shoots, total areas in corresponding regions match closely (the excesses and deficits “cancel”). - Practical use: estimate proportions such as “% scoring under 5” by reading the area under the curve left of x=5.
Formal Definition & Properties of Density Curves
- Two defining properties
- The curve never drops below the x-axis (on/above axis).
- The total area under the curve equals 1 (represents 100 % of the observations): \text{Area}=1.
- A density curve describes—it never contains—real data points; it is an approximation that is “accurate enough for practical use.”
- Outliers (= departures from the main pattern) are not captured by the smooth curve.
When and Why to Use Density Curves
- Requirements for declaring a distribution “regular” enough for a density curve:
• Large n – many observations (e.g., 947 scores).
• Consistency across groups – repeated samples give the same shape. - Benefits:
• Allows quick probability / proportion estimates via areas.
• Becomes the backbone for theoretical models (Normal, Uniform, Exponential, etc.).
Real Data vs. Model Data — Notation
| Context | Mean | Standard Deviation |
|---|---|---|
| Actual sample (computed) | \bar x | s_x |
| Theoretical / density-curve model | \mu (Greek “mu”) | \sigma (Greek “sigma”) |
- Example: “Average weight of all African bullfrogs” is unknown; we model it with \mu and \sigma rather than \bar x & s_x from a single sample.
Example 1 – Iowa Test Density Curve
- Data: vocabulary scores of 947 students.
- Observed histogram peaks around 7; tails near 2-3 and 12-13.
- Smooth red curve matches the ridge/troughs: area left of x=5 under histogram ≈ area under density curve (both estimate proportion < 5).
Shapes of Density Curves (Illustrations)
- Uniform rectangle (not all curves “curve”!)
• Support [2,6]; constant height chosen so \text{Area}=1.
• Width =4 → height =\tfrac{1}{4} → each 1-unit segment holds 0.25 (25 %) of the data. - Right-skewed smooth curve on [9,13]
• Larger area between 9–10 than 12–13 ⇒ more observations early; tails to the right. - Foxtrot cartoon: three hypothetical “grading curves” that exaggerate skewness / bimodality / degenerate mass.
Mean (μ) and Median on a Density Curve
- Median = equal-areas point (50 % left, 50 % right).
- Mean = balance point (center of mass if curve were solid).
- For a perfectly symmetric curve: \mu = \text{median} (both at center).
- For skewed right: mean pulled right of median.
- For skewed left: mean pulled left of median.
Estimating Areas & Proportions
- Shaded region from x=7 to x=8 in a left-skewed curve visually ≈ 0.12 → about 12 % of observations.
- Later in course: integrate f(x) or use tables/technology to compute exact probabilities.
Legitimacy Checks & Common Misconceptions
- To qualify as a density curve, both conditions must hold: area 1, entire curve above x-axis.
- A histogram bar chart ≠ density curve (bars have gaps, arbitrary widths, and area sums to n not 1).
- Rectangles are allowed (Uniform model). Smoothness is not a requirement; the keyword is continuous support, area 1.
Quick Practice / Position Examples
- Given lettered points A, B, C on various curves:
• Symmetric mountain: midpoint (B) is both \mu and median.
• Right-skewed: median at B (50/50), mean at C (further right to balance).
• Left-skewed: median at B, mean at A (further left). - Always verify 50 % criterion for median, “pivot” criterion for mean.
Key Takeaways
- Density curves model large, consistent data sets and let us treat areas as probabilities.
- Must be above x-axis with total area =1.
- Outliers are ignored; the curve captures the overall pattern.
- Distinguish sample symbols (\bar x, s_x) from population / model symbols (\mu, \sigma).
- Balance point = \mu, equal-area point = median; skew determines their order.
- Uniform, Normal, skewed, multimodal—all are possible density curves so long as area 1.
- Next steps in class: compute exact areas via integration or probability tables for specific families (Normal distribution, etc.).