Density Curves

Reviewing Quantitative Data Exploration

  • Any time you receive a quantitative data set, follow the classic 3-step exploration before considering a density curve.
    1. Graph the data
      • Acceptable graphs: stem plot, boxplot, histogram, relative histogram, ogive, dot plot.
    2. Look for overall pattern & shape
      • Ask where the data cluster, whether it are symmetric, skewed, multi-modal, etc.
    3. Compute numerical summaries (center & spread)
      • If shape ≈ symmetric & “mountain-shaped,” use \bar x and s_x (mean & standard deviation).
      • If shape is skewed or contains strong outliers, use \text{median} & \text{IQR}.
  • NEW optional Step 4 (today’s topic): if the pattern is so regular we can replace the jagged histogram with a smooth density curve.

Introducing Density Curves

  • A smooth curve drawn over a histogram that approximates its overall shape.
  • Useful when repeated samples show the same underlying pattern (regularity + consistency).
  • Idea illustrated with 947 Iowa Test vocabulary scores: purple histogram (actual) vs red smooth curve (model).
    • Despite minor over/under-shoots, total areas in corresponding regions match closely (the excesses and deficits “cancel”).
  • Practical use: estimate proportions such as “% scoring under 5” by reading the area under the curve left of x=5.

Formal Definition & Properties of Density Curves

  • Two defining properties
    1. The curve never drops below the x-axis (on/above axis).
    2. The total area under the curve equals 1 (represents 100 % of the observations): \text{Area}=1.
  • A density curve describes—it never contains—real data points; it is an approximation that is “accurate enough for practical use.”
  • Outliers (= departures from the main pattern) are not captured by the smooth curve.

When and Why to Use Density Curves

  • Requirements for declaring a distribution “regular” enough for a density curve:
    Large n – many observations (e.g., 947 scores).
    Consistency across groups – repeated samples give the same shape.
  • Benefits:
    • Allows quick probability / proportion estimates via areas.
    • Becomes the backbone for theoretical models (Normal, Uniform, Exponential, etc.).

Real Data vs. Model Data — Notation

ContextMeanStandard Deviation
Actual sample (computed)\bar xs_x
Theoretical / density-curve model\mu (Greek “mu”)\sigma (Greek “sigma”)
  • Example: “Average weight of all African bullfrogs” is unknown; we model it with \mu and \sigma rather than \bar x & s_x from a single sample.

Example 1 – Iowa Test Density Curve

  • Data: vocabulary scores of 947 students.
  • Observed histogram peaks around 7; tails near 2-3 and 12-13.
  • Smooth red curve matches the ridge/troughs: area left of x=5 under histogram ≈ area under density curve (both estimate proportion < 5).

Shapes of Density Curves (Illustrations)

  • Uniform rectangle (not all curves “curve”!)
    • Support [2,6]; constant height chosen so \text{Area}=1.
    • Width =4 → height =\tfrac{1}{4} → each 1-unit segment holds 0.25 (25 %) of the data.
  • Right-skewed smooth curve on [9,13]
    • Larger area between 9–10 than 12–13 ⇒ more observations early; tails to the right.
  • Foxtrot cartoon: three hypothetical “grading curves” that exaggerate skewness / bimodality / degenerate mass.

Mean (μ) and Median on a Density Curve

  • Median = equal-areas point (50 % left, 50 % right).
  • Mean = balance point (center of mass if curve were solid).
  • For a perfectly symmetric curve: \mu = \text{median} (both at center).
  • For skewed right: mean pulled right of median.
  • For skewed left: mean pulled left of median.

Estimating Areas & Proportions

  • Shaded region from x=7 to x=8 in a left-skewed curve visually ≈ 0.12 → about 12 % of observations.
  • Later in course: integrate f(x) or use tables/technology to compute exact probabilities.

Legitimacy Checks & Common Misconceptions

  • To qualify as a density curve, both conditions must hold: area 1, entire curve above x-axis.
  • A histogram bar chart ≠ density curve (bars have gaps, arbitrary widths, and area sums to n not 1).
  • Rectangles are allowed (Uniform model). Smoothness is not a requirement; the keyword is continuous support, area 1.

Quick Practice / Position Examples

  • Given lettered points A, B, C on various curves:
    • Symmetric mountain: midpoint (B) is both \mu and median.
    • Right-skewed: median at B (50/50), mean at C (further right to balance).
    • Left-skewed: median at B, mean at A (further left).
  • Always verify 50 % criterion for median, “pivot” criterion for mean.

Key Takeaways

  • Density curves model large, consistent data sets and let us treat areas as probabilities.
  • Must be above x-axis with total area =1.
  • Outliers are ignored; the curve captures the overall pattern.
  • Distinguish sample symbols (\bar x, s_x) from population / model symbols (\mu, \sigma).
  • Balance point = \mu, equal-area point = median; skew determines their order.
  • Uniform, Normal, skewed, multimodal—all are possible density curves so long as area 1.
  • Next steps in class: compute exact areas via integration or probability tables for specific families (Normal distribution, etc.).