Notes on Central Tendency, Distribution Shape, and Variation

Measures of Central Tendency

  • Central tendency: measures that describe the middle of a dataset.

  • Major measures: mean (average), median, and mode. All start with 'm' in naming, but they have different meanings and uses.

  • Population vs. sample distinction matters for notation and interpretation, not for the basic idea of the mean.

Mean
  • Also called the average in everyday language.

  • Notation:

    • Population mean: μ\mu

    • Sample mean: xˉ\bar{x}

  • Formulas:

    • Population mean: μ=1Ni=1Nxi\mu = \frac{1}{N} \sum_{i=1}^{N} x_i

    • Sample mean: xˉ=1ni=1nxi\bar{x} = \frac{1}{n} \sum_{i=1}^{n} x_i

  • Summary of notation:

    • When dealing with populations, we commonly use Greek symbols (e.g., μ\mu, σ\sigma).

    • When dealing with samples, we use Latin letters (e.g., xˉ\bar{x}, ss).

  • Why two symbols?

    • The formulas look the same, but the interpretation differs: a population mean is a fixed (but unknown) parameter; a sample mean varies from sample to sample.

    • If you take different samples from the same population, you’d expect different xˉ\bar{x} values; the population mean μ\mu would be the same (if you had the whole population, there’d be no sampling variability).

  • Example: Ages in an entire family (population).

    • Suppose the dataset yields a population mean of about μ41.86\mu \approx 41.86\, (rounded to 42 in context).

    • Interpretation: the average age of the family is around 42 years.

    • Note on interpretation: the mean describes the center, but it may not capture the dataset well if there are outliers or a skewed distribution.

  • Example: Sample mean context (distance to campus for a sample of faculty).

    • A sample mean might be something like xˉ=12.2units\bar{x} = 12.2\,\text{units} (e.g., minutes or miles, depending on data).

    • The suitability of the mean as a descriptor depends on the dataset; sometimes the sample mean better describes the dataset than the population mean in practice.

  • Quick takeaways:

    • The mean uses every value in the dataset, which is an advantage (no data are ignored).

    • The mean is sensitive to outliers: extreme values can pull the mean toward them.

Median
  • Definition: the middle value of a dataset when ordered from smallest to largest.

  • How to find it:

    • If the number of data points n is odd, the median is the middle value.

    • If n is even, the median is the average of the two middle values.

  • Formula for the two-middle case (when in order):

    median=x(n2)+x(n2+1)2  (n even)\text{median} = \frac{x_{(\frac{n}{2})} + x_{(\frac{n}{2}+1)}}{2}\; (n\text{ even})

  • For odd n: median is the value at position (x((n+12))x_{((\frac{n+1}{2}))}).

  • Terminology:

    • The median is sometimes called the midpoint of the data. To avoid confusion with the mean, the term midpoint is often used when two middle values are averaged.

  • Example concepts:

    • For a small dataset with 5 numbers, the median is the third value after sorting.

    • For an even-sized dataset (e.g., 6 numbers), the median is the average of the 3rd and 4th values, which may yield a value like 11.511.5 (not always ending in .5, but commonly when the central values are consecutive).

  • Why use the median?

    • The median can better describe the center when data are skewed or contain outliers, since it is not pulled toward extreme values.

Mode
  • Definition: the value that occurs most often in the dataset.

  • Key points:

    • If no value repeats, there is no mode.

    • A dataset can have more than one mode:

    • Two modes: bimodal

    • Three modes: trimodal

    • More than two: multimodal (sometimes used informally; technically “multimodal”)

  • Practical note:

    • The mode indicates the most frequent value but is not always informative about the dataset’s overall shape or center.

  • Example:

    • A dataset where 14 occurs most often has a mode of 14.

Relationship of central tendency measures to data shape
  • In symmetric distributions (bell-shaped), the mean, median, and mode are all equal (or very close): μmedianmode\mu \approx \text{median} \approx \text{mode}

  • In uniform distributions, the mean and median are equal, but the mode may be undefined or may occur at multiple values.

  • In skewed distributions:

    • Skewed right (outliers to the right) typically has \mu > \text{median}

    • Skewed left (outliers to the left) typically has \mu < \text{median}

Measures of Variation and Distribution Shape

Range
  • Definition: the difference between the maximum and minimum values in the dataset.

  • Formula:
    Range=maxi(xi)mini(xi)\text{Range} = \max_i(x_i) - \min_i(x_i)

  • Example interpretation:

    • If Company A salaries have range =10=10 (thousand dollars) and Company B salaries have range =35=35, Company B has a wider spread in salaries, suggesting greater variability.

Deviation, Variance, and Standard Deviation
  • Deviation:

    • For a given data value, the deviation from the mean is:
      di=xiμ(population)\quad d_i = x_i - \mu\quad\text{(population)}
      or
      di=xixˉ(sample)\quad d_i = x_i - \bar{x}\quad\text{(sample)}

    • Sign indicates whether the value is below (negative) or above (positive) the mean.

  • Population variance and standard deviation:

    • Variance (population):
      σ2=1Ni=1N(xiμ)2\sigma^2 = \frac{1}{N}\sum_{i=1}^{N} (x_i - \mu)^2

    • Standard deviation (population):
      σ=σ2\sigma = \sqrt{\sigma^2}

  • Why square deviations?

    • To avoid cancellation of positive and negative deviations; variance measures average squared distance from the mean.

  • Sample variance and standard deviation (with Bessel's correction):

    • Sample variance:
      s2=1n1i=1n(xixˉ)2s^2 = \frac{1}{n-1}\sum_{i=1}^{n} (x_i - \bar{x})^2

    • Sample standard deviation:
      s=s2s = \sqrt{s^2}

  • Bessel's correction: divide by (n-1) (not by (n)) to account for the fact that a sample typically underestimates the population variability; provides a less biased estimate of the true variability.

  • Note: The correction is specific to samples; for populations you divide by (N).

  • Intuition: with a smaller bottom (n-1), the estimate of spread is a bit larger to allow for sampling variability (and to reflect the fact that the sample mean is used instead of the true population mean).

  • Practical takeaway:

    • Standard deviation tells you, on average, how far data values are from the mean.

    • Smaller standard deviation means data are tightly clustered around the mean; larger standard deviation means more spread.

  • Worked example (population):

    • Data (population) of salaries A: mean xˉ=41.5(thousand)\bar{x} = 41.5\,(\text{thousand}) (example value).

    • Deviations: subtract the mean, square, and sum; suppose the sum of squared deviations equals 7,? (example step shown in the lecture).

    • Variance: σ2=1N(xiμ)2\sigma^2 = \frac{1}{N}\sum (x_i - \mu)^2 (In the example, this yielded a value of 88.8588.85)

    • Standard deviation: σ=σ2\sigma = \sqrt{\sigma^2} (In the example, this was approximately 9.439.43)

    • Interpretation: average deviation in the chosen units.

  • Worked example (sample, with steps):

    • Data (sample) from eight players recovering from a concussion.

    • Mean: xˉ=39.5\bar{x} = 39.5

    • Deviations: subtract 39.5, square, and sum.

    • Sum of squared deviations (example): 56.0 (illustrative).

    • With Bessel's correction (n = 8, n-1 = 7):

    • Sample variance: s2=sum of squared deviationsn1s^2 = \frac{\text{sum of squared deviations}}{n-1} (For example, if sum of squared deviations is 56.056.0 and n=8n=8, then s2=8.0s^2 = 8.0)

    • Sample standard deviation: s=s2s = \sqrt{s^2} (In the example, this was approximately 2.83days2.83\,\text{days})

    • Interpretation: On average, recovery times vary by about 2.83days2.83\,\text{days} around the mean of 39.5days39.5\,\text{days}.

  • Important distinction (contextual):

    • Population measures use the entire group; sample measures use a subset and adjust with Bessel's correction to better estimate the population value.

  • Practical example from the lecture:

    • Concussion data: mean 39.5, standard deviation 13.3\approx 13.3 (days).

    • Interpretation: On average, recovery times vary by about 13.3days13.3\,\text{days} from the mean; a larger spread indicates more individual variation in recovery times.

The Empirical Rule (for approximately symmetric data)
  • If the data are approximately symmetric (bell-shaped):

    • About 68% of data lie within one standard deviation of the mean: P(Xμσ)0.68P(|X - \mu| \le \sigma) \approx 0.68

    • About 95% lie within two standard deviations: P(Xμ2σ)0.95P(|X - \mu| \le 2\sigma) \approx 0.95

    • About 99.7% lie within three standard deviations: P(Xμ3σ)0.997P(|X - \mu| \le 3\sigma) \approx 0.997

  • What this means in practice:

    • If a distribution is symmetric, you can estimate how much data falls in these bands without computing every value.

    • If data lie outside two standard deviations from the mean in a symmetric shape, they are in the outer 5% (extreme values).

  • Important caveat:

    • The empirical rule applies to symmetric data, not to skewed data; for skewed data, the percentages will not match these exact values.

  • Conceptual analogy used in class:

    • Skittle example: with 100 items and 5 poisoned, the event of drawing a poisoned item is extremely rare (outside the 5% tail); this illustrates how we think about tails and probability bounds in symmetric contexts.

Practical Notes and Strategies
  • Always consider all three measures (mean, median, mode) to understand data; they can tell different stories depending on shape and outliers.

  • Outliers affect the mean more than the median; this is a key reason to report multiple measures.

  • Range can be informative but is sensitive to extreme values; it does not capture how values are distributed between min and max.

  • Use weighted means when different observations contribute unequally (e.g., GPA weighted by credit hours):

    • Weighted mean formula:
      xˉw=iwixiiwi\bar{x}_w = \frac{\sum_i w_i x_i}{\sum_i w_i}

    • Example (GPA): if course grades are weighted by credit hours, the total weighted sum divided by total credits gives the GPA; a worked example in class yielded approximately 3.293.29 (which rounds to about 3.3) for a GPA.

  • The choice of descriptor depends on the data: for skewed data or data with outliers, the median or mode may describe the center better than the mean; for relatively clean, symmetric data, the mean is often informative.

  • A note on practice:

    • Formulas are important concepts to understand; you’ll perform by-hand calculations a few times to build intuition, then use software (e.g., Excel) for larger datasets.

  • Summary of notations:

    • Population: mean μ\mu, standard deviation σ\sigma, variance σ2\sigma^2 (and sometimes a population-specific formula).

    • Sample: mean xˉ\bar{x}, standard deviation ss, variance s2s^2, with the key correction (divide by (n-1)) known as Bessel's correction.

  • Quick interpretive guideline:

    • If mean is much smaller than median, there may be a left (low-end) outlier pulling the mean down.

    • If mean is much larger than median, there may be a right (high-end) outlier pulling the mean up.

Appendix: Notation Quick Reference

  • Notation used for populations: μ,σ,σ2\mu, \sigma, \sigma^2, etc.

  • Notation used for samples: xˉ,s,s2\bar{x}, s, s^2, etc.

  • X-bar (xˉ\bar{x}) vs Mu (μ\mu): sample vs population means, respectively.

  • The summation symbol \sum is used to denote adding a series of numbers.

  • Order statistics: x(k)x_{(k)} denotes the k-th smallest value in the ordered data.

  • Deviation notations: di=xiμ(population)d_i = x_i - \mu\quad\text{(population)} or $$d_i = x_i -