Notes on Mean vs Median (Transcript)
- Definition of the mean (average): the sum of observations divided by the number of observations.
- Formula: mean=n∑<em>i=1nx</em>i
- Everyday use vs statistical term: what people call the "average" is the mean in statistics.
- Definition of the median: the middle value when data are ordered from smallest to largest.
- If the data are sorted as x<em>(1)≤x</em>(2)≤⋯≤x(n), then:
- For odd $n$, the median is x(2n+1)
- For even $n$, the median is the average of the two middle values: median=2x<em>(n/2)+x</em>(n/2+1)
- Key difference: mean can be pulled by outliers or skew; median reflects the middle of the distribution and is less affected by extreme values.
- Conceptual takeaway: mean and median can tell different stories about the data depending on distribution shape.
The Feet Data Example
- Data context: 9 people observed; total number of feet counted = 17.
- Compute the mean:
- mean=917≈1.888…≈1.89
- The transcript states the mean as 1.89 feet per person.
- Narrative devices: mentions of two-left-feet jokes and a high-tech prosthetic to illustrate data collection and variation.
- Median concept in this dataset:
- If you arrange the feet counts from smallest to largest for the 9 people, the median is the middle value (the 5th value since $n=9$), i.e., the central observation.
- The transcript notes the idea of the "middle two feet" as a way to illustrate the middle of the distribution; formally, for odd $n$ you take the single middle value, for even $n$ you average the two middle values.
- Important implication:
- It is possible for the mean to be a non-integer that no one actually has (e.g., 1.89 feet), while the median is an actual observed value.
- This underscores that the mean and the median can diverge in real data.
- Takeaway about the two measures:
- Mean can imply an overall average that doesn’t reflect typical individuals if the data are skewed.
- Median represents a typical observation in the ordered list.
- In medical research, the mean risk is often higher than the median risk due to a small number of individuals with very high risk (outliers).
- Example concept: heart disease risk distribution where some individuals have high cholesterol, diabetes, smoking history, or family history that elevates risk.
- Consequence of mean being pulled up by high-risk outliers:
- More than half of the population may have a risk below the mean, even though the mean is elevated by a few high-risk cases.
- Practical and ethical implications:
- Relying on the mean risk can overstate the typical risk and lead to unnecessary pills or tests for many individuals.
- In decision-making, the median risk may provide a better sense of what a typical person faces when the distribution is skewed.
- Highlights the importance of choosing the appropriate measure of central tendency for medical guidelines, patient communication, and policy.
Two Key Points About Scientific Statements (from the transcript)
- Statements must be precise: they should be stated clearly with the exact quantities involved (e.g., the mean risk, the median, etc.).
- They require careful thinking about truth and representativeness:
- The concept of sampling is introduced as a foundational principle for making inferences from data.
- The transcript playfully mentions sampling Doritos as a joke, indicating that sampling is a topic to be addressed later, but the underlying idea is to consider how samples reflect a population.
- Mean:
mean=n∑<em>i=1nx</em>i - Median (sorted data x<em>(1)≤⋯≤x</em>(n)):
median={x<em>(2n+1),n odd 2x</em>(n/2)+x(n/2+1),amp;n even - Key interpretation:
- Mean is sensitive to outliers and skewness.
- Median is robust to outliers and provides a central tendency that better reflects the typical observation when data are skewed.
Real-World Implications and Takeaways
- In skewed data (like many medical risk distributions), prefer the median to describe typical values.
- Use the mean when the distribution is roughly symmetric and not heavily influenced by outliers.
- Always consider the distribution shape before choosing which center measure to report for informing decisions and policy.
- Be cautious of relying solely on the mean in contexts with potential outliers or highly skewed data, especially in healthcare and risk assessment.