Notes on Mean vs Median (Transcript)

Mean vs Median in Data

  • Definition of the mean (average): the sum of observations divided by the number of observations.
    • Formula: mean=<em>i=1nx</em>in\text{mean} = \frac{\sum<em>{i=1}^n x</em>i}{n}
  • Everyday use vs statistical term: what people call the "average" is the mean in statistics.
  • Definition of the median: the middle value when data are ordered from smallest to largest.
    • If the data are sorted as x<em>(1)x</em>(2)x(n)x<em>{(1)} \le x</em>{(2)} \le \dots \le x_{(n)}, then:
    • For odd $n$, the median is x(n+12)x_{(\frac{n+1}{2})}
    • For even $n$, the median is the average of the two middle values: median=x<em>(n/2)+x</em>(n/2+1)2\text{median} = \frac{x<em>{(n/2)} + x</em>{(n/2+1)}}{2}
  • Key difference: mean can be pulled by outliers or skew; median reflects the middle of the distribution and is less affected by extreme values.
  • Conceptual takeaway: mean and median can tell different stories about the data depending on distribution shape.

The Feet Data Example

  • Data context: 9 people observed; total number of feet counted = 17.
  • Compute the mean:
    • mean=1791.8881.89\text{mean} = \frac{17}{9} \approx 1.888… \approx 1.89
    • The transcript states the mean as 1.89 feet per person.
  • Narrative devices: mentions of two-left-feet jokes and a high-tech prosthetic to illustrate data collection and variation.
  • Median concept in this dataset:
    • If you arrange the feet counts from smallest to largest for the 9 people, the median is the middle value (the 5th value since $n=9$), i.e., the central observation.
    • The transcript notes the idea of the "middle two feet" as a way to illustrate the middle of the distribution; formally, for odd $n$ you take the single middle value, for even $n$ you average the two middle values.
  • Important implication:
    • It is possible for the mean to be a non-integer that no one actually has (e.g., 1.89 feet), while the median is an actual observed value.
    • This underscores that the mean and the median can diverge in real data.
  • Takeaway about the two measures:
    • Mean can imply an overall average that doesn’t reflect typical individuals if the data are skewed.
    • Median represents a typical observation in the ordered list.

Mean vs Median in Medicine and Risk

  • In medical research, the mean risk is often higher than the median risk due to a small number of individuals with very high risk (outliers).
    • Example concept: heart disease risk distribution where some individuals have high cholesterol, diabetes, smoking history, or family history that elevates risk.
  • Consequence of mean being pulled up by high-risk outliers:
    • More than half of the population may have a risk below the mean, even though the mean is elevated by a few high-risk cases.
  • Practical and ethical implications:
    • Relying on the mean risk can overstate the typical risk and lead to unnecessary pills or tests for many individuals.
    • In decision-making, the median risk may provide a better sense of what a typical person faces when the distribution is skewed.
    • Highlights the importance of choosing the appropriate measure of central tendency for medical guidelines, patient communication, and policy.

Two Key Points About Scientific Statements (from the transcript)

  • Statements must be precise: they should be stated clearly with the exact quantities involved (e.g., the mean risk, the median, etc.).
  • They require careful thinking about truth and representativeness:
    • The concept of sampling is introduced as a foundational principle for making inferences from data.
    • The transcript playfully mentions sampling Doritos as a joke, indicating that sampling is a topic to be addressed later, but the underlying idea is to consider how samples reflect a population.

Formulas and Definitions (Summary)

  • Mean:
    mean=<em>i=1nx</em>in\text{mean} = \frac{\sum<em>{i=1}^n x</em>i}{n}
  • Median (sorted data x<em>(1)x</em>(n)x<em>{(1)} \le \dots \le x</em>{(n)}):
    median={x<em>(n+12),n odd x</em>(n/2)+x(n/2+1)2,amp;n even\text{median} = \begin{cases} x<em>{(\frac{n+1}{2})}, & n \text{ odd} \ \frac{x</em>{(n/2)} + x_{(n/2+1)}}{2}, &amp; n \text{ even} \end{cases}
  • Key interpretation:
    • Mean is sensitive to outliers and skewness.
    • Median is robust to outliers and provides a central tendency that better reflects the typical observation when data are skewed.

Real-World Implications and Takeaways

  • In skewed data (like many medical risk distributions), prefer the median to describe typical values.
  • Use the mean when the distribution is roughly symmetric and not heavily influenced by outliers.
  • Always consider the distribution shape before choosing which center measure to report for informing decisions and policy.
  • Be cautious of relying solely on the mean in contexts with potential outliers or highly skewed data, especially in healthcare and risk assessment.