Week 3 Notes: Measures of Center, Variability, Boxplots, and Location in Distributions (Sections 3.1-3.4)
3.1 Measures of Center
Central tendency concepts: mode, mean, or median describe the center of a distribution.
Mode: most frequent observation(s). Not necessarily central; can have multiple or no modes.
Example: data set 2, 4, 6, 7, 3, 2, 2, 1, 2, 1 has a mode of 2.
Excel:
MODE.MULT(highlight data).
Mean (average): sum of observations divided by number of observations.
Formula: \bar{x} = \frac{x1 + x2 + x3 + \cdots + xn}{n}
Sensitive to outliers: outliers affect the mean.
Median: middle value when data are ordered.
For odd n: median = x_{((n+1)/2)}.
For even n: median = (x{(n/2)} + x{(n/2 + 1)}) / 2.
Data MUST be ordered.
Excel:
=MEDIAN(highlight data).Resistant to outliers (unlike the mean).
When to use:
Skewed or with outliers: Prefer the median.
Roughly symmetric with no outliers: Mean is a good summary.
Comparing mean and median (shape):
Skewed to the left: Mean < Median. (The original note was truncated here, inferring the standard comparison for left-skewed distributions.)
3.2 Measuring Variability
Goals: Find range, calculate/interpret standard deviation (s or \sigma), find/interpret interquartile range (IQR).
Intuition: Distributions with the same center can have different spreads.
Range:
largest value - smallest value= \text{Range} = \max(xi) - \min(xi).Tells spread between extremes; NOT resistant to outliers.
Excel:
=MAX(range) - MIN(range).
Interquartile Range (IQR): Measure of spread robust to outliers; aligns with median.
Quartiles:
Q1: median of bottom 50% (25th percentile).
Q3: median of top 50% (75th percentile).
IQR definition:
IQR = Q3 - Q1. Never negative.Excel:
Q1 = QUARTILE.INC(range, 1),Q3 = QUARTILE.INC(range, 3);IQR = Q3 - Q1.5-number summary: (min, Q1, median, Q3, max) often used with IQR.
Standard deviation (SD): How much observations differ from their mean, on average.
Sample SD: s = \sqrt{\frac{\sum{i=1}^n (xi - \bar{x})^2}{n-1}}.
Population SD: \sigma = \sqrt{\frac{\sum{i=1}^n (xi - \mu)^2}{n}}.
NOT resistant to outliers (outliers inflate SD).
Excel:
=STDEV.S(range)(sample),=STDEV.P(range)(population).
Choosing variability measure:
Symmetric/no outliers: Use standard deviation.
Skewed/outliers: Use IQR.
3.3 Boxplots and Outliers
1.5xIQR rule for outliers:
Lower cutoff: Q_1 - 1.5\times\text{IQR}.
Upper cutoff: Q_3 + 1.5\times\text{IQR}.
Observations outside these cutoffs are suspected outliers.
Boxplot features: Uses the 5-number summary (min, Q1, median, Q3, max).
Box spans from Q1 to Q3; median line inside.
Whiskers extend to smallest/largest non-outlier values.
Outliers marked with an asterisk or dot.
Conveys center (median) and spread (IQR, whiskers); does not show sample size.
Construction steps:
Find 5-number summary.
Draw scaled horizontal axis.
Draw box Q1-Q3.
Draw median line in box.
Extend whiskers to non-outlier data; mark outliers.
Label axis/caption.
3.4 Measuring Location in a Distribution
Percentiles (and percent rank):
Definition: Percentage of data values less than a given value
x.Example:
5 of 50 data values below x= 5/50 = 10\% (10th percentile).Usually reported as whole numbers; round down.
Z-scores (standardized scores):
Definition: How many standard deviations an observation is from the mean.
Population: Z = \frac{x - \mu}{\sigma}.
Sample: z = \frac{x - \bar{x}}{s}.
Positive z-score = above mean; negative = below mean. Unitless.
Excel:
=STANDARDIZE(value, mean, standard_deviation).
Comparing locations across distributions:
Percentiles: Rank-based, robust to distribution shape.
Z-scores: Relative to mean/spread, requires
μandσ.Example: Jordan's height z-score of 1.0 vs Zayne's 0.50 means Jordan is relatively taller for her age/sex group.