Module 2.2: Statistics – Measures of Central Tendency, Dispersion & Position
Measures of Central Tendency
- Definition: a single number that represents a “typical” or central value of a data set.
- Key measures examined:
Mean
- Population mean
- Formula: \mu = \frac{\Sigma x}{N} where N is the population size.
- Sample mean
- Formula: \bar{x} = \frac{\Sigma x}{n} where n is the sample size.
- Properties
- Uses every value in the data set ⇒ regarded as a “reliable” measure.
- Highly sensitive to extreme values (outliers).
- Worked example: round-trip Chicago→Cancún flights
- Data: 872, 432, 397, 427, 388, 782, 397 (7 fares)
- Sum =3695
- Mean = \frac{3695}{7}\approx 528 dollars.
- Middle value once data are ordered.
- Splits an ordered data set into two equal halves.
- Odd n → median is the true middle entry.
- Even n → median is the mean of the two middle entries.
- Example 1 (7 fares)
- Ordered: 388, 397, 397, 427, 432, 782, 872
- Median =427 dollars.
- Example 2 (remove the 432-dollar fare; n=6)
- Ordered: 388, 397, 397, 427, 782, 872
- Median =\frac{397+427}{2}=412 dollars.
- Strength: resistant to outliers.
Mode
- Most frequent data entry (raw data) or class midpoint with highest frequency (grouped data).
- No repeated entry ⇒ “no mode.”
- Flight-price example
- Ordered data revealed 397 appears twice; all others once.
- Mode =397 dollars.
- All address “typical” value but differ in sensitivity:
- Mean: uses all values; affected by outliers.
- Median: ignores magnitude of extreme ends; unaffected by outliers.
- Mode: only frequency matters; may be non-unique or absent; often least representative.
- Age sample (20 students)
- Data: 20,20,20,20,20,20,21,21,21,21,22,22,22,23,23,23,23,24,24,65
- Mean \approx 23.8 years (pulled upward by outlier 65).
- Median =21.5 years (middle two: 21,22).
- Mode =20 years (highest frequency).
- Interpretation: mean “fair” but skewed; median resists skew; mode least descriptive here.
Weighted Mean
- Appropriate when each observation carries a specific weight w.
- Formula: \bar{x}_w = \frac{\Sigma(w\,x)}{\Sigma w}.
- Course-grade example
- Weights: tests 50\%, midterm 15\%, final 20\%, lab 10\%, homework 5\%.
- Scores: 86,96,82,98,100.
- Weighted sum \Sigma(wx)=88.6, \Sigma w=1 ⇒ weighted mean =88.6.
- Required \ge 90 for an A → just missed.
Mean of a Frequency Distribution
- Use class midpoints x and frequencies f.
- Formula: \bar{x}=\frac{\Sigma(xf)}{n} where n=\Sigma f.
- Illustration (7 classes, n=50): \Sigma(xf)=2089 ⇒ \bar{x}=41.78 (units per original context).
Distribution Shapes & Central Location
- Uniform: mean≈median≈mode, flat-top.
- Symmetric (no skew): mean=median=mode at center.
- Left-skewed (−ve): mean < median < mode (tail to left).
- Right-skewed (+ve): mode < median < mean (tail to right).
Measures of Dispersion
- Range: \text{Range}=(\text{max})-(\text{min}).
- Variance & Standard Deviation
- Deviation: x-\mu (population) or x-\bar{x} (sample).
- Population variance: \sigma^2=\frac{\Sigma(x-\mu)^2}{N}.
- Population standard deviation: \sigma=\sqrt{\sigma^2}.
- Sample variance: s^2=\frac{\Sigma(x-\bar{x})^2}{n-1}.
- Sample standard deviation: s=\sqrt{s^2}.
- Interpretation: larger \sigma or s ⇒ greater spread.
Deviation Example (10 salaries)
- Mean salary \mu=41.5 (thousand ).
- Computed each x-\mu; positive and negative sums cancel (total zero).
Empirical Rule (68–95–99.7)
- Applies to bell-shaped (normal) distributions.
- ≈68\% within \mu\pm\sigma.
- ≈95\% within \mu\pm2\sigma.
- ≈99.7\% within \mu\pm3\sigma.
- Application
- Women’s heights \bar{x}=64 in, s=2.71 in.
- Upper bound 64+2\sigma =64+5.42=69.42 in.
- Percent between 64 and 69.42 inches = 34\%+13.5\%=47.5\%.
Standard Deviation for Grouped Data
- When raw data are binned into a frequency distribution:
- Compute \bar{x} using midpoints.
- Determine sum of squares \Sigma f(x-\bar{x})^2.
- Use sample s=\sqrt{\frac{\Sigma f(x-\bar{x})^2}{n-1}}.
- Children-per-household example
- Mean \approx1.8, sample s\approx1.7 children.
Quartiles & Interquartile Range
- Quartiles divide ordered data into four parts.
- Q_1: ≈25th percentile.
- Q_2: median.
- Q_3: ≈75th percentile.
- Finding quartiles (15 CPR scores)
- Ordered list ⇒ Q_2=15.
- Lower half median Q_1=10.
- Upper half median Q_3=18.
- Interquartile Range \text{IQR}=Q3-Q1=18-10=8.
Percentiles & Other Fractiles
- Fractile summary
- Quartiles Q1,Q2,Q_3: divide by 4.
- Deciles D1\dots D9: divide by 10.
- Percentiles P1\dots P{99}: divide by 100.
- Ogive interpretation example
- 72nd percentile → SAT score =1700.
- Meaning: 72% scored \le1700.
Standard Score (z-Score)
- Formula: z=\frac{x-\mu}{\sigma}.
- Interpretation
- z>0: value above mean; z<0 below mean.
- |z| >2 often considered unusual.
- Oscar winners example (2007)
- Forest Whitaker (Best Actor): z=\frac{45-43.7}{8.8}\approx0.15 (slightly above average age).
- Helen Mirren (Best Actress): z=\frac{61-36}{11.5}\approx2.17 (unusually older relative to past winners).