Module 2.2: Statistics – Measures of Central Tendency, Dispersion & Position

Measures of Central Tendency

  • Definition: a single number that represents a “typical” or central value of a data set.
  • Key measures examined:
    • Mean
    • Median
    • Mode

Mean

  • Population mean
    • Formula: \mu = \frac{\Sigma x}{N} where N is the population size.
  • Sample mean
    • Formula: \bar{x} = \frac{\Sigma x}{n} where n is the sample size.
  • Properties
    • Uses every value in the data set ⇒ regarded as a “reliable” measure.
    • Highly sensitive to extreme values (outliers).
  • Worked example: round-trip Chicago→Cancún flights
    • Data: 872, 432, 397, 427, 388, 782, 397 (7 fares)
    • Sum =3695
    • Mean = \frac{3695}{7}\approx 528 dollars.

Median

  • Middle value once data are ordered.
  • Splits an ordered data set into two equal halves.
    • Odd n → median is the true middle entry.
    • Even n → median is the mean of the two middle entries.
  • Example 1 (7 fares)
    • Ordered: 388, 397, 397, 427, 432, 782, 872
    • Median =427 dollars.
  • Example 2 (remove the 432-dollar fare; n=6)
    • Ordered: 388, 397, 397, 427, 782, 872
    • Median =\frac{397+427}{2}=412 dollars.
  • Strength: resistant to outliers.

Mode

  • Most frequent data entry (raw data) or class midpoint with highest frequency (grouped data).
    • No repeated entry ⇒ “no mode.”
  • Flight-price example
    • Ordered data revealed 397 appears twice; all others once.
    • Mode =397 dollars.

Comparing Mean, Median & Mode

  • All address “typical” value but differ in sensitivity:
    • Mean: uses all values; affected by outliers.
    • Median: ignores magnitude of extreme ends; unaffected by outliers.
    • Mode: only frequency matters; may be non-unique or absent; often least representative.
  • Age sample (20 students)
    • Data: 20,20,20,20,20,20,21,21,21,21,22,22,22,23,23,23,23,24,24,65
    • Mean \approx 23.8 years (pulled upward by outlier 65).
    • Median =21.5 years (middle two: 21,22).
    • Mode =20 years (highest frequency).
    • Interpretation: mean “fair” but skewed; median resists skew; mode least descriptive here.

Weighted Mean

  • Appropriate when each observation carries a specific weight w.
  • Formula: \bar{x}_w = \frac{\Sigma(w\,x)}{\Sigma w}.
  • Course-grade example
    • Weights: tests 50\%, midterm 15\%, final 20\%, lab 10\%, homework 5\%.
    • Scores: 86,96,82,98,100.
    • Weighted sum \Sigma(wx)=88.6, \Sigma w=1 ⇒ weighted mean =88.6.
    • Required \ge 90 for an A → just missed.

Mean of a Frequency Distribution

  • Use class midpoints x and frequencies f.
  • Formula: \bar{x}=\frac{\Sigma(xf)}{n} where n=\Sigma f.
  • Illustration (7 classes, n=50): \Sigma(xf)=2089 ⇒ \bar{x}=41.78 (units per original context).

Distribution Shapes & Central Location

  • Uniform: mean≈median≈mode, flat-top.
  • Symmetric (no skew): mean=median=mode at center.
  • Left-skewed (−ve): mean < median < mode (tail to left).
  • Right-skewed (+ve): mode < median < mean (tail to right).

Measures of Dispersion

  • Range: \text{Range}=(\text{max})-(\text{min}).
  • Variance & Standard Deviation
    • Deviation: x-\mu (population) or x-\bar{x} (sample).
    • Population variance: \sigma^2=\frac{\Sigma(x-\mu)^2}{N}.
    • Population standard deviation: \sigma=\sqrt{\sigma^2}.
    • Sample variance: s^2=\frac{\Sigma(x-\bar{x})^2}{n-1}.
    • Sample standard deviation: s=\sqrt{s^2}.
  • Interpretation: larger \sigma or s ⇒ greater spread.

Deviation Example (10 salaries)

  • Mean salary \mu=41.5 (thousand ).
  • Computed each x-\mu; positive and negative sums cancel (total zero).

Empirical Rule (68–95–99.7)

  • Applies to bell-shaped (normal) distributions.
    • ≈68\% within \mu\pm\sigma.
    • ≈95\% within \mu\pm2\sigma.
    • ≈99.7\% within \mu\pm3\sigma.
  • Application
    • Women’s heights \bar{x}=64 in, s=2.71 in.
    • Upper bound 64+2\sigma =64+5.42=69.42 in.
    • Percent between 64 and 69.42 inches = 34\%+13.5\%=47.5\%.

Standard Deviation for Grouped Data

  • When raw data are binned into a frequency distribution:
    1. Compute \bar{x} using midpoints.
    2. Determine sum of squares \Sigma f(x-\bar{x})^2.
    3. Use sample s=\sqrt{\frac{\Sigma f(x-\bar{x})^2}{n-1}}.
  • Children-per-household example
    • Mean \approx1.8, sample s\approx1.7 children.

Quartiles & Interquartile Range

  • Quartiles divide ordered data into four parts.
    • Q_1: ≈25th percentile.
    • Q_2: median.
    • Q_3: ≈75th percentile.
  • Finding quartiles (15 CPR scores)
    • Ordered list ⇒ Q_2=15.
    • Lower half median Q_1=10.
    • Upper half median Q_3=18.
  • Interquartile Range \text{IQR}=Q3-Q1=18-10=8.

Percentiles & Other Fractiles

  • Fractile summary
    • Quartiles Q1,Q2,Q_3: divide by 4.
    • Deciles D1\dots D9: divide by 10.
    • Percentiles P1\dots P{99}: divide by 100.
  • Ogive interpretation example
    • 72nd percentile → SAT score =1700.
    • Meaning: 72% scored \le1700.

Standard Score (z-Score)

  • Formula: z=\frac{x-\mu}{\sigma}.
  • Interpretation
    • z>0: value above mean; z<0 below mean.
    • |z| >2 often considered unusual.
  • Oscar winners example (2007)
    • Forest Whitaker (Best Actor): z=\frac{45-43.7}{8.8}\approx0.15 (slightly above average age).
    • Helen Mirren (Best Actress): z=\frac{61-36}{11.5}\approx2.17 (unusually older relative to past winners).