Week 3 Notes: Measures of Center, Variability, Boxplots, and Location in Distributions (Sections 3.1-3.4)

3.1 Measures of Center
  • Central tendency concepts: mode, mean, or median describe the center of a distribution.

  • Mode: most frequent observation(s). Not necessarily central; can have multiple or no modes.

    • Example: data set 2, 4, 6, 7, 3, 2, 2, 1, 2, 1 has a mode of 2.

    • Excel: MODE.MULT(highlight data).

  • Mean (average): sum of observations divided by number of observations.

    • Formula: \bar{x} = \frac{x1 + x2 + x3 + \cdots + xn}{n}

    • Sensitive to outliers: outliers affect the mean.

  • Median: middle value when data are ordered.

    • For odd n: median = x_{((n+1)/2)}.

    • For even n: median = (x{(n/2)} + x{(n/2 + 1)}) / 2.

    • Data MUST be ordered.

    • Excel: =MEDIAN(highlight data).

    • Resistant to outliers (unlike the mean).

  • When to use:

    • Skewed or with outliers: Prefer the median.

    • Roughly symmetric with no outliers: Mean is a good summary.

  • Comparing mean and median (shape):

    • Skewed to the left: Mean < Median. (The original note was truncated here, inferring the standard comparison for left-skewed distributions.)

3.2 Measuring Variability
  • Goals: Find range, calculate/interpret standard deviation (s or \sigma), find/interpret interquartile range (IQR).

  • Intuition: Distributions with the same center can have different spreads.

  • Range: largest value - smallest value = \text{Range} = \max(xi) - \min(xi).

    • Tells spread between extremes; NOT resistant to outliers.

    • Excel: =MAX(range) - MIN(range).

  • Interquartile Range (IQR): Measure of spread robust to outliers; aligns with median.

    • Quartiles:

    • Q1: median of bottom 50% (25th percentile).

    • Q3: median of top 50% (75th percentile).

    • IQR definition: IQR = Q3 - Q1. Never negative.

    • Excel: Q1 = QUARTILE.INC(range, 1), Q3 = QUARTILE.INC(range, 3); IQR = Q3 - Q1.

    • 5-number summary: (min, Q1, median, Q3, max) often used with IQR.

  • Standard deviation (SD): How much observations differ from their mean, on average.

    • Sample SD: s = \sqrt{\frac{\sum{i=1}^n (xi - \bar{x})^2}{n-1}}.

    • Population SD: \sigma = \sqrt{\frac{\sum{i=1}^n (xi - \mu)^2}{n}}.

    • NOT resistant to outliers (outliers inflate SD).

    • Excel: =STDEV.S(range) (sample), =STDEV.P(range) (population).

  • Choosing variability measure:

    • Symmetric/no outliers: Use standard deviation.

    • Skewed/outliers: Use IQR.

3.3 Boxplots and Outliers
  • 1.5xIQR rule for outliers:

    • Lower cutoff: Q_1 - 1.5\times\text{IQR}.

    • Upper cutoff: Q_3 + 1.5\times\text{IQR}.

    • Observations outside these cutoffs are suspected outliers.

  • Boxplot features: Uses the 5-number summary (min, Q1, median, Q3, max).

    • Box spans from Q1 to Q3; median line inside.

    • Whiskers extend to smallest/largest non-outlier values.

    • Outliers marked with an asterisk or dot.

    • Conveys center (median) and spread (IQR, whiskers); does not show sample size.

  • Construction steps:

    1. Find 5-number summary.

    2. Draw scaled horizontal axis.

    3. Draw box Q1-Q3.

    4. Draw median line in box.

    5. Extend whiskers to non-outlier data; mark outliers.

    6. Label axis/caption.

3.4 Measuring Location in a Distribution
  • Percentiles (and percent rank):

    • Definition: Percentage of data values less than a given value x.

    • Example: 5 of 50 data values below x = 5/50 = 10\% (10th percentile).

    • Usually reported as whole numbers; round down.

  • Z-scores (standardized scores):

    • Definition: How many standard deviations an observation is from the mean.

    • Population: Z = \frac{x - \mu}{\sigma}.

    • Sample: z = \frac{x - \bar{x}}{s}.

    • Positive z-score = above mean; negative = below mean. Unitless.

    • Excel: =STANDARDIZE(value, mean, standard_deviation).

  • Comparing locations across distributions:

    • Percentiles: Rank-based, robust to distribution shape.

    • Z-scores: Relative to mean/spread, requires μ and σ.

    • Example: Jordan's height z-score of 1.0 vs Zayne's 0.50 means Jordan is relatively taller for her age/sex group.