Central Tendency and Variability in Height Data: Mean, Median, Mode, Range, Variance, and Standard Deviation

Data Overview and Setup

  • Variable: height
  • Frequency table observed: heights 60, 61, 62, 63, 64, 65 with frequencies 2, 3, 5, 5, 3, 2 respectively
  • Total observations: N = 20
    • Verification: sum of frequencies = 2 + 3 + 5 + 5 + 3 + 2 = 20
  • Histogram interpretation: height of bar corresponds to how many people have that height; translates the frequency table into a distribution of heights

Central Tendency: Mean, Median, and Mode

  • Mean (mathematical average)
    • Formula: xˉ=f<em>ix</em>in\bar{x} = \frac{\sum f<em>i x</em>i}{n}
    • Computation from the data:
    • Numerator: 60×2+61×3+62×5+63×5+64×3+65×2=125060\times 2 + 61\times 3 + 62\times 5 + 63\times 5 + 64\times 3 + 65\times 2 = 1250
    • Denominator: n=20n = 20
    • Result: xˉ=125020=62.5\bar{x} = \frac{1250}{20} = 62.5
  • Median
    • For an even number of observations, the median is the average of the two middle values
    • With N = 20, the two middle positions are the 10th and 11th values
    • From the ordered data, the middle values are 62 and 63
    • Hence: Median=62+632=62.5\text{Median} = \frac{62 + 63}{2} = 62.5
  • Mode
    • Definition: most frequent value(s)
    • In this dataset, 62 and 63 each occur 5 times → bimodal with modes at 62 and 63
  • Relationship to symmetry
    • A perfectly symmetric distribution has mean = median
    • If unimodal and symmetric, the mode often coincides with the center; if bimodal symmetric, there can be two modes
    • In the example, mean = median = 62.5; mode can be one or two values depending on the peak structure

Influence of Symmetry and Outliers

  • Two distributions with the same central tendency but different spreads illustrate key ideas:
    • If the distribution is symmetric, the mean and median are equal
    • Outliers affect the mean more than the median; skewness shifts the mean away from the center
    • If outliers balance (roughly symmetric but with heavy tails), the mean and median can still be close
  • Symmetry and the mode
    • In a symmetric unimodal distribution, the mode is at the center (single mode)
    • In a symmetric but bimodal distribution, there can be two modes

Variability: Range, Variance, and Standard Deviation

  • Importance of variability
    • Central tendency alone (mean/median/mode) may hide how spread out the data are
    • Two distributions can have the same center but different spreads
  • Range
    • Definition: Range=maxx<em>iminx</em>i\text{Range} = \max x<em>i - \min x</em>i
    • Example: If heights range from 60 to 65, Range=6560=5\text{Range} = 65 - 60 = 5
    • Alternative description: “from 60 to 65”
    • Note: Range ignores how data are distributed between min and max
  • Deviations and the sum of squares
    • Deviation from the mean for each observation: xixˉx_i - \bar{x}
    • Distinguish sign: below mean gives negative values, above mean gives positive values
    • To avoid cancellation, square deviations: (xixˉ)2(x_i - \bar{x})^2
    • Sum of squared deviations (Sum of Squares): <em>i=1n(x</em>ixˉ)2\sum<em>{i=1}^n (x</em>i - \bar{x})^2
  • Sample variance and standard deviation
    • Sample variance (with Bessel's correction): s2=<em>i=1n(x</em>ixˉ)2n1s^2 = \frac{\sum<em>{i=1}^n (x</em>i - \bar{x})^2}{n-1}
    • In the worked example, the sum of squared deviations is 64 and n = 5, so: s2=6451=644=16s^2 = \frac{64}{5-1} = \frac{64}{4} = 16
    • Sample standard deviation: s=s2=16=4s = \sqrt{s^2} = \sqrt{16} = 4
  • Quick intuition
    • The standard deviation measures the typical distance of a score from the mean
    • Smaller SD = data are more tightly clustered around the mean; larger SD = more spread
  • Population parameters (for reference)
    • Population variance: σ2=(xiμ)2n\sigma^2 = \frac{\sum (x_i - \mu)^2}{n}
    • Population standard deviation: σ=σ2\sigma = \sqrt{\sigma^2}

Worked Example: Step-by-Step from Frequency Table to Summary Stats

  • Data setup recap: heights with frequencies 60(2), 61(3), 62(5), 63(5), 64(3), 65(2) → N = 20
  • Mean recap: xˉ=602+613+625+635+643+65220=125020=62.5\bar{x} = \frac{60\cdot 2 + 61\cdot 3 + 62\cdot 5 + 63\cdot 5 + 64\cdot 3 + 65\cdot 2}{20} = \frac{1250}{20} = 62.5
  • Median recap: for N = 20, middle values are 62 and 63 → Median=62+632=62.5\text{Median} = \frac{62 + 63}{2} = 62.5
  • Mode recap: 62 and 63 both occur 5 times → two modes (62, 63)
  • Range recap: max = 65, min = 60 → Range=5\text{Range} = 5
  • Variability example (sum of squared deviations)
    • Suppose <em>i=1n(x</em>ixˉ)2=64\sum<em>{i=1}^n (x</em>i - \bar{x})^2 = 64 for this illustrative subset of data
    • Then: s2=6451=16,s=16=4s^2 = \frac{64}{5-1} = 16, \quad s = \sqrt{16} = 4
  • Interpretations
    • Mean = 62.5, Median = 62.5; Mode depends on peak structure (here could be two modes)
    • Range = 5; SD = 4 in the worked numerical example

Quick Formulas Recap

  • Central tendency
    • xˉ=f<em>ix</em>in\bar{x} = \frac{\sum f<em>i x</em>i}{n}
    • Median: middle value(s) logic for even/odd n
    • Mode: most frequent value(s)
  • Variability
    • Range: Range=maxx<em>iminx</em>i\text{Range} = \max x<em>i - \min x</em>i
    • Variance (sample): s2=<em>i=1n(x</em>ixˉ)2n1s^2 = \frac{\sum<em>{i=1}^n (x</em>i - \bar{x})^2}{n-1}
    • Standard deviation (sample): s=s2s = \sqrt{s^2}
    • Population variance: σ2=(xiμ)2n,σ=σ2\sigma^2 = \frac{\sum (x_i - \mu)^2}{n},\quad \sigma = \sqrt{\sigma^2}
  • Key takeaway
    • Symmetric distributions tend to have mean ≈ median; unimodal symmetric distributions often have mode at the center; symmetry does not guarantee a single mode
    • The two core pieces to describe a distribution well are the central tendency (mean/median/mode) and the variability (range, variance, standard deviation)