Measures of Central Tendency: Mean, Median, and Mode

Measures of Central Tendency

Introduction

A fundamental task in many statistical analyses is to estimate a location parameter for the distribution, that is to find a typical or central point that best describes the data.
Measures of central tendency refers to techniques that inform us about the location of values in a distribution. It refers to a central point around which other values tend to cluster.
The objective is to know which value tends to lie in the center of a distribution. This value could be represented by the common value, expected value or middle value.
The mean, median and mode are the three methods commonly used to provide statistical insight into the center point of a data distribution. These methods provide a single value to summarize an entire distribution.
The three methods share a common purpose, but they reveal different approaches to finding the central point in a distribution. Each method is calculated in a different manner, each describes the “typical” score in the dataset differently, and each has its own advantages and disadvantages.
The choice of method depends on the type of data and how it was measured. Numerical data is summarized using the mean while categorical data is summarized using the mode.

Mode

Mode refers to the value in the distribution that appears most frequently. It is the most common occurring value in a dataset or the value in a dataset with the highest frequency.
Mode is used when your interest is to report on the most popular or most common value in a dataset. It is most useful when you want a quick and easy indicator of central point in your dataset. For instance, the mode is used to answer everyday questions like “what is the most event on campus?”
Mode is simple and easy to find as it involves no mathematical calculation, but it is least powerful of the three measures of central tendency.
Mode can be calculated for both categorical and numerical data with all scales of measurement (nominal, ordinal, interval and ratio), making it the most ubiquitous of the three methods.
Mode is limited in a situation where the most common value may not necessary be the most typical value. For example, the age distribution: 13, 15, 14, 18, 12, 17, 16, 20, 15, 75, 75 has the mode of 75 years, but this is not the typical age for the group.
Mode is also limited in situation where a distribution may have no mode or two or more modes which make it impossible or difficult to use a single value to summarize the dataset.

Types of Mode

Uni-modal: A distribution with one mode (e.g., 08, 11, 12, 15, 18, 18, 18, 22, 33, 42, 44, 45, 53 - mode is 18).
Bi-modal: A distribution with two modes (e.g., 08, 11, 12, 15, 18, 18, 18, 22, 33, 42, 42, 42, 53 - modes are 18 and 42).
Multi-modal: A distribution with three or more modes (e.g., 08, 11, 11, 11, 15, 18, 18, 18, 19, 22, 33, 42, 42, 42, 45, 53 - modes are 11, 18 and 42).
Non-modal: A distribution with no mode (e.g., 08, 11, 40, 12, 15, 18, 21, 22, 23, 33, 42, 44, 45, 53).

Mode Visualization

Mode is visualized as the highest point (peak) on a graph.
Example: A grade distribution where thirty-five (35) students scored C+ makes C+ the mode.

Simple Mean

Mean, also known as the average, is the most used method to summarize and describe the central point in a dataset. It measures the expected value in the data distribution.
You can only calculate the mean when you have a numerical data measured at interval or ratio scale. Thus, it is not possible to calculate the mean for a categorical data.
Mean is closer to all scores in a distribution than any other measures of central tendency. Hence, it is the point in a distribution around which the variation of the values is minimized.
Mean can effectively be used to model data as it minimizes error in the prediction of any one value in your dataset. Thus, it is the value that produces the lowest amount of error from all other values in the data set.
Mean is affected by every single value in the dataset, including any outlier or extreme value. The usefulness of the mean is limited when applied to a dataset that contains an extreme value as this usually leads to misrepresentation of the central point in the dataset.
When calculating for the mean, the mean considers every value in the dataset. On the other hand, the median and mode consider only one or two values in the dataset.
The mean (average) is calculated by finding the sum or total of all the values in a dataset and then divide the total by the number of values in the dataset.

Formula for Mean

The mean is calculated using the formula: $Mean (X) = \frac{Total Value (\Sigma X)}{No \ of \ Cases (N)}$
Example: Ages of 9 employees: 20, 26, 40, 36, 23, 18, 35, 24, 30
- $Mean = \frac{20 + 26 + 40 + 36 + 23 + 18 + 35 + 24 + 30}{9} = \frac{252}{9} = 28$ \
- The mean age of employees is 28 years. Thus, the average employee is 28 years of age.

Weighted Mean

A weighted mean is calculated when the dataset is assigned weights to indicate their relative importance. The weight can be assigned as frequencies, percentage or proportions.
*Example 1: A student took four separate tests and obtained the following grade: 60, 70, 60 and 90. The weights assigned to the tests are 10%, 10% 30% and 50%, respectively. What is the average grade?
$Mean = \frac{60+70+60+90}{4} = 70$

$Mean = \frac{600.1+700.1+600.3+900.5}{1} = 76$
Example 2: To complete an online quiz, three students spent 40 minutes each, two students spent 60 minutes each, and one student spent 30 minutes. Find the average time spent by students to complete the online quiz. $Mean = \frac{340+260+130}{6} = 45$
Example 3: The hourly wages from payroll shows that 12 staff were paid $14 each, 20 staff were paid $18 each, 10 staff were paid $25 each and 8 staff were paid $40 each. What was the mean hourly wage received by employees? $Mean = \frac{1214+2018+1025+8*40}{50} = 21.96$

Grouped Mean

A mean can be determined for a grouped data, or data that is placed in intervals. Unlike listed data, the individual values for grouped data are not known so we cannot calculate their sum.
To find the average for a grouped data, the first step is to determine the midpoint of each interval, or class. These midpoints must then be multiplied by the frequencies for the corresponding interval class. The sum of the products divided by the total frequencies or total number of values will be the value of the mean.

Formula for Grouped Mean

Group Mean can be calculated as: $Group \ Mean = \frac{\Sigma(FX)}{\Sigma F}$
*Example: $\frac{873}{37} = 23.59$

Limitations of the Mean

The usefulness of the mean is limited by extreme value or outliers. The presence of an extreme value can easily distort or skew the actual representation of the data.
It is not advisable to use the mean to summarize or describe a dataset when an outlier or extreme value is present in the data. This is because the average may not accurately depict the central point in such dataset.
The median provides a better way of summarizing dataset when an outlier or extreme value is present.
Example: The ages of occupants in a room 1, 1, 2, 3, 4, 7, 80
- Average age with extreme value = 14 years
- Average age without extreme value = 3 years
- Median age = 3 years
- A typical person in the room is a toddler and NOT a teenager

Median

Median is the middle value of a ranked observation. The median is the value in a middle position of a data distribution.
Since the median is always at the exact center of a data distribution, it splits the distribution into two equal parts. Half (50%) of the data have values less than the median and the other half (50%) have values more than the median.
Suppose a community has a median family income of $35,000. This implies half of the family incomes are less than $35,000 and the other half of the family incomes are more than $35,000.
The median can be used for numerical data when measured at interval or ratio scales. The median can also be used for categorical data when measured at ordinal scale. However, the median cannot be used for nominal data because you cannot rank a nominal data.
Median is a preferred method when data is skewed because the median is not affected by extreme value. Thus, the median is resistant to the effect of outliers. This is often the case with house prices and annual incomes where outliers or extreme values are more likely.

Steps to Find the Median

Rank the values.
Apply the rank formula:
- When you have ODD number of observations: $Rank = \frac{n + 1}{2}$
  *e.g. 15, 19, 11, 44, 12, 18, 42, 25, 33, 45, 53, 18, 08
  *ranked: 08, 11, 12, 15, 18, 18, 19, 25, 33, 42, 44, 45, 53
- $Rank = \frac{13 + 1}{2} = 7$
  *The 7th ranked observation is 19.
Count to the rank position.
When you have EVEN number of observations:
- $Rank = \frac{n + 1}{2}$
  *e.g. 15, 19, 11, 44, 12, 18, 42, 25, 33, 45, 53, 18, 08, 60
  *ranked: 08, 11, 12, 15, 18, 18, 19, 25, 33, 42, 44, 45, 53, 60
- $Rank = \frac{14 + 1}{2} = 7.5$
  *The 7.5th ranked observation lies between 19 and 25. So you find the average of the two middle values.
- $(19 + 25) / 2 = 22$
Count to the rank position.

Which Method to Use?

Use the MODE when:
- You want to report the common value in the dataset.
- You have a categorical data measured at nominal scale.
- You want a quick and easy way to summarize the dataset.
Use the MEDIAN when:
- You want to report the middle value in the dataset.
- You have data measured at ordinal scale, interval or ratio scale.
- You have a dataset skewed by an outlier or extreme value.
Use the MEAN when:
- You want to report the expected value in the dataset.
- You have numerical data measured at interval or ratio scale
- You have a dataset with no outlier or extreme value.

Summary Table

Type	Scale	Right Method
Categorical Data	Nominal	Mode
	Ordinal	Median
Numerical Data	Interval	Mean (no extreme value)
		Median (extreme value)
	Ratio	Mean (no extreme value)
		Median (extreme value)