Notes on Measures of Central Tendency (Mean, Median, Mode)

Central ideas in Measures of Central Tendency

Central tendency questions ask where data tend to “sit” or cluster in a dataset. The three main measures are mean, median, and mode.
Mean (average) and median are typically found in the middle of the dataset; mode is the most frequently occurring value.
The speaker emphasizes keeping vocabulary intuitive: mean = average, median = middle value, mode = most frequent value; the terms align with everyday language.
Practical purpose of these measures: provide concise descriptions of a dataset’s location and help in comparing different datasets.

The Mean (Average)

Definition: the arithmetic average of n observations.
- Formula: \bar{x} = \frac{1}{n} \sum{i=1}^{n} xi
How to compute: add all values and divide by the number of values.
Example concept (intuitive): if you have a small dataset, the mean gives the overall central value around which observations are distributed.
Important caveats from the transcript:
- Outliers can heavily influence the mean, pulling it toward extreme values.
- Changing a single data point (e.g., adding a large outlier) can dramatically alter the mean.
Quick numerical illustration (corrected for clarity):
- Dataset: {3, 5} => \bar{x} = \frac{3+5}{2} = 4
- If we add a large outlier, say 100, the dataset becomes {3, 5, 100} and the mean becomes \bar{x} = \frac{3+5+100}{3} = 36. The mean moves toward the outlier.
Common misconception highlighted in the transcript: a three-value example with a dramatic outlier can misstate the mean; the key point is the sensitivity to outliers.
When the speaker discusses the mean’s interpretation, he notes that the mean indicates where data are clustered, not necessarily the exact middle of the distribution in all cases (see the discussion on skew).

The Median (Middle Value)

Definition: the middle value of a dataset when it is ordered from smallest to largest.
Ordering requirement: values must be arranged in increasing order before determining the median.
Determining the median depends on the number of observations (n):
- If n is odd: the median is the middle value, \text{median} = x_{(\frac{n+1}{2})} where the subscripts denote ordered positions.
- If n is even: the median is the average of the two middle values, \text{median} = \frac{x{(\frac{n}{2})} + x{(\frac{n}{2}+1)}}{2}
Conceptual takeaway from the transcript:
- The median is the middle value, but it can be a value not actually present in the dataset (e.g., when n is even and the two middle numbers average to a non-data value).
Examples from the discussion:
- If you have eight values (even n), the two middle values determine the median via their average.
- If you alter one of the middle values from 7 to 8, the median changes to (\frac{7+8}{2} = 7.5).
- If you have nine values (odd n), the single middle value is the median.
Key properties:
- The median is robust to outliers; adding an extremely large or small value often shifts the mean significantly but may shift the median only slightly.
Practical interpretation:
- The median represents a value such that half the data lie below and half above (for large enough samples, this interpretation holds).

The Mode (Most Frequent Value)

Definition: the value that occurs most often in the dataset.
Key label: mode is the most frequent value; a dataset can be unimodal (one mode) or bimodal (two modes).
Special notes from the transcript:
- If more than two values share the maximum frequency, the dataset is described as having no mode (per the speaker’s convention, though in standard statistics this would be a multimodal dataset with more than two modes).
- The mode is particularly useful for categorical data (e.g., most popular color in a survey).
Examples:
- In a dataset where 1 appears more often than other numbers, the mode is 1.
- A dataset with two values tied for the highest frequency is bimodal (two modes).
- A dataset where the most frequent value is shared among many values could be described as having no single mode under the transcript’s rule.
Practical takeaway:
- For numeric data, the mode is often less informative about the data’s center than the mean or median; it is most informative for understanding the most common category/value, especially in categorical data.

Mean vs Median vs Mode: When They Coincide and When They Do Not

Symmetric (bell-shaped) distributions:
- Mean = Median = Mode (all align at the center).
Skewed distributions:
- The mean tends to be pulled toward the tail (outliers) and lies on the tail side of the distribution.
- The mode is at the peak of the distribution and is on the opposite side of the tail relative to the mean in a skewed distribution.
- The median lies between the mode and the mean, closer to the mean than the mode, but its exact position depends on the skew.
Logical explanation provided in the transcript:
- The mean is heavily influenced by outliers, which pull it toward extreme values (the tail).
- In right-skewness (tail to the right), outliers pull the mean to the right; in left-skewness (tail to the left), outliers pull the mean to the left.
Practical implication:
- For skewed data or data with outliers, the median often provides a more representative central tendency than the mean.
- For symmetric data with no big outliers, the mean can be an informative descriptor of central tendency.
Visual intuition:
- A graph can show a dataset where the mean sits at a point not visually at the exact center when the data are skewed, illustrating the mean’s sensitivity to tail values.

Outliers, Robustness, and the Trimmed Mean

Outliers have a large influence on the mean but limited influence on the median.
Concept of trimming:
- Trimming (e.g., 5% trimming) means removing a percentage of data from the ends of the ordered dataset (the tails).
- After trimming, you compute the mean of the remaining data, often called a trimmed mean.
- Rationale: trimming reduces the influence of extreme values and can produce a measure of central tendency closer to the median in skewed data.
Effects of trimming:
- The trimmed mean moves closer to the center of the data (often more like the median).
- However, trimming alters the original dataset and is not the same as the true mean of the untrimmed data.
- Trimming should be used with awareness that you are creating a different descriptor; always preserve the original data if you may need it later.
Practical caution from the transcript:
- If you trim data in real-world work, do not erase values; create a new dataset with the trimmed values and keep the original dataset intact for accuracy and traceability.

Weighted Averages (Weighted Means)

Definition: a mean where each observation contributes according to a weight w_i, reflecting its importance, frequency, or relative size.
General formula:
- \bar{x}w = \frac{\sumi wi xi}{\sumi wi}
Key intuition:
- When observations carry different significance (e.g., different course credit hours or different numbers of units), the weighted mean provides a more meaningful average than the simple unweighted mean.
Common real-world examples:
- GPA in which different grades have different credit hours (weights):
- A = 4 points, B = 3 points, C = 2 points, D = 1 point, F = 0 points.
- The overall GPA is a weighted mean of grade points by course credits (weights).
- Rent example illustrating unequal classes or units:
- The formula is the same: multiply each value by its weight, sum, and divide by the total weight.
- In the transcript, the weighted mean example concluded with an average rent of about $950 (illustrating how heavier weights can pull the average toward higher-value units).
Important takeaway:
- Weighted means account for differing contributions of observations and are essential when the data are not uniformly representative.

Real-World Examples and Interpretations

GPA example (weighted mean interpretation):
- Grades carry points (A = 4, B = 3, C = 2, D = 1, F = 0).
- Each course also has a credit hour value (weight).
- The overall GPA is the weighted average of grade points across courses by their credit hours.
Housing/rent example (weighted mean interpretation):
- Rent values (xi) and numbers of units or weights (wi) determine the overall average rent when some rents are more common or represent more units.
- The weighted mean reflects the value of rents more accurately when rents are not uniformly distributed across units.
Ethical and practical considerations:
- Trimmed means and weighted means are descriptive tools; they do not replace a full data analysis. They should be used with transparency about how data were handled and what was included or excluded.
- When presenting statistics to inform decisions, note how outliers, skewness, and weighting choices influence the conclusions drawn from these measures.

Connections to Foundational Principles and Real-World Relevance

Relationship to data distribution shapes:
- Symmetric distributions: mean, median, and mode align.
- Skewed distributions: the order of mean, median, and mode reflects tail direction and outliers.
Practical applications:
- Descriptive summaries in reports, dashboards, and initial data explorations.
- Selecting appropriate measures for decision making (e.g., use median for income data with outliers, use weighted mean for GPA or rent analyses).
Cautions and best practices:
- Always consider data distribution and presence of outliers before choosing a measure.
- Preserve original data when applying transformations like trimming for later validation.
- Use weighted means to reflect true contributions (e.g., course credits, units) rather than treating each observation equally when that assumption is invalid.

Quick Recap of Key Takeaways

Mean is the sum of values divided by the count: \bar{x} = \frac{1}{n} \sum{i=1}^{n} xi
Median is the middle value after ordering; rules depend on whether n is odd or even:
- If n is odd: \text{median} = x_{(\frac{n+1}{2})}
- If n is even: \text{median} = \frac{x{(\frac{n}{2})} + x{(\frac{n}{2}+1)}}{2}
Mode is the most frequent value; can be unimodal or bimodal; may be undefined in some datasets per the transcript’s convention.
Outliers affect the mean much more than the median; the median is more robust to extremes.
Trimming removes a portion of data tails to reduce outlier influence but changes the dataset; keep the original data for accuracy.
Weighted mean accounts for differing importance or frequency of observations: \bar{x}w = \frac{\sumi wi xi}{\sumi wi}
In practice, choose the measure that matches the data distribution and the decision context.