Density Curves

Any time you receive a quantitative data set, follow the classic 3-step exploration before considering a density curve.
1. Graph the data
  • Acceptable graphs: stem plot, boxplot, histogram, relative histogram, ogive, dot plot.
2. Look for overall pattern & shape
  • Ask where the data cluster, whether it are symmetric, skewed, multi-modal, etc.
3. Compute numerical summaries (center & spread)
  • If shape ≈ symmetric & “mountain-shaped,” use $\bar x$ and $s_x$ (mean & standard deviation).
  • If shape is skewed or contains strong outliers, use $\text{median}$ & $\text{IQR}$ .
NEW optional Step 4 (today’s topic): if the pattern is so regular we can replace the jagged histogram with a smooth density curve.

A smooth curve drawn over a histogram that approximates its overall shape.
Useful when repeated samples show the same underlying pattern (regularity + consistency).
Idea illustrated with 947 Iowa Test vocabulary scores: purple histogram (actual) vs red smooth curve (model).
• Despite minor over/under-shoots, total areas in corresponding regions match closely (the excesses and deficits “cancel”).
Practical use: estimate proportions such as “% scoring under 5” by reading the area under the curve left of $x=5$ .

Two defining properties
1. The curve never drops below the $x$ -axis (on/above axis).
2. The total area under the curve equals 1 (represents 100 % of the observations): $\text{Area}=1$ .
A density curve describes—it never contains—real data points; it is an approximation that is “accurate enough for practical use.”
Outliers (= departures from the main pattern) are not captured by the smooth curve.

Requirements for declaring a distribution “regular” enough for a density curve:
• Large n – many observations (e.g., 947 scores).
• Consistency across groups – repeated samples give the same shape.
Benefits:
• Allows quick probability / proportion estimates via areas.
• Becomes the backbone for theoretical models (Normal, Uniform, Exponential, etc.).

Context	Mean	Standard Deviation
Actual sample (computed)	$\bar x$	$s_x$
Theoretical / density-curve model	$\mu$ (Greek “mu”)	$\sigma$ (Greek “sigma”)

Example: “Average weight of all African bullfrogs” is unknown; we model it with $\mu$ and $\sigma$ rather than $\bar x$ & $s_x$ from a single sample.

Data: vocabulary scores of 947 students.
Observed histogram peaks around 7; tails near 2-3 and 12-13.
Smooth red curve matches the ridge/troughs: area left of $x=5$ under histogram ≈ area under density curve (both estimate proportion < 5).

Uniform rectangle (not all curves “curve”!)
• Support $[2,6]$ ; constant height chosen so $\text{Area}=1$ .
• Width $=4$ → height $=\tfrac{1}{4}$ → each 1-unit segment holds $0.25$ (25 %) of the data.
Right-skewed smooth curve on $[9,13]$
• Larger area between 9–10 than 12–13 ⇒ more observations early; tails to the right.
Foxtrot cartoon: three hypothetical “grading curves” that exaggerate skewness / bimodality / degenerate mass.

Shaded region from $x=7$ to $x=8$ in a left-skewed curve visually ≈ 0.12 → about 12 % of observations.
Later in course: integrate $f(x)$ or use tables/technology to compute exact probabilities.

To qualify as a density curve, both conditions must hold: area 1, entire curve above $x$ -axis.
A histogram bar chart ≠ density curve (bars have gaps, arbitrary widths, and area sums to n not 1).
Rectangles are allowed (Uniform model). Smoothness is not a requirement; the keyword is continuous support, area 1.

Given lettered points A, B, C on various curves:
• Symmetric mountain: midpoint (B) is both $\mu$ and median.
• Right-skewed: median at B (50/50), mean at C (further right to balance).
• Left-skewed: median at B, mean at A (further left).
Always verify 50 % criterion for median, “pivot” criterion for mean.

Density curves model large, consistent data sets and let us treat areas as probabilities.
Must be above $x$ -axis with total area $=1$ .
Outliers are ignored; the curve captures the overall pattern.
Distinguish sample symbols ( $\bar x, s_x$ ) from population / model symbols ( $\mu, \sigma$ ).
Balance point = $\mu$ , equal-area point = median; skew determines their order.
Uniform, Normal, skewed, multimodal—all are possible density curves so long as area 1.
Next steps in class: compute exact areas via integration or probability tables for specific families (Normal distribution, etc.).