Descriptive Statistics Notes: Mean, Median, Mode in Excel
Overview: Descriptive statistics in Excel
Descriptive statistics describe data for an audience (clients, managers, etc.).
Focus in this course: measures of central tendency (location) and basic distribution ideas.
We use Excel to compute these instead of manual calculations; Excel has built-in functions for mean, median, mode, percentiles, quartiles, etc.
Data context in the transcript: a dataset of rental rates (70 values) used to illustrate how to obtain descriptive statistics.
Emphasis on practical use in business/data analytics: part of creating managerial reports and solving firm problems.
Percentiles and quartiles are mentioned as additional descriptive stats beyond the three main measures; quartiles are a type of percentile.
Excel advantages highlighted: automatic recalculation when data change; ignoring blank cells; simpler workflow for large data.
Notation and terminology preview: population vs. sample; population mean (µ) vs. sample mean (x̄); population size (N) vs. sample size (n).
Mean (Average)
The mean is the average value of a data set; in statistics, the terms mean and average are interchangeable.
Types of means mentioned:
Simple (unweighted) mean
Weighted mean
Exponential averages (briefly noted)
Key formulas (population vs. sample):
Population mean: \mu = \frac{\sum{i=1}^{N} Xi}{N}
Sample mean: \bar{X} = \frac{\sum{i=1}^{n} Xi}{n}
Common interpretation: add up all data values and divide by the number of values.
Notation distinction:
Population parameter uses Greek letters (e.g., µ, N).
Sample statistics use simple letters (e.g., \bar{X}, n).
How to compute in Excel:
Use the AVERAGE function; Excel calls this the mean (not literally the word "mean").
Steps: select the data range; Excel returns the mean; you can round as needed (e.g., to 2 decimals).
Example setup described:
A dataset of 70 rental rates (7 rows × 10 columns).
The mean (average) rental rate calculated with AVERAGE in Excel.
Excel ignores blank cells when computing AVERAGE.
Practical notes about updating:
If you add data within the selected range, the mean automatically recalculates (dynamic updating).
If you expand data beyond the initially selected cells, you can reference an entire column (or a larger range) to keep formulas updating.
Common data organization tip:
Typically, one variable per column is standard; you can set up templates to easily reuse formulas for different problems.
Notation reminder for exam prep:
Population mean: \mu = \frac{\sum{i=1}^{N} Xi}{N}
Sample mean: \bar{X} = \frac{\sum{i=1}^{n} Xi}{n}
Summation symbol: \sum, data values: X_i, population size: N, sample size: n
In the example, the mean value given is 590.80 (in dollars, rounded to two decimals).
Median
The median is the middle value of a sorted data set; half of the values fall above and half below.
How to determine by hand (conceptually): sort data from lowest to highest.
Rules for odd vs. even n (conceptual):
If n is odd, the median is the middle value.
If n is even, the median is the average of the two middle values.
Excel handling: Excel computes the median automatically without you manually sorting or deciding odd/even.
Excel usage:
Use the MEDIAN function; simply select the data range and Excel returns the median.
Example results (70 rental rates):
Median = 575.00 (half the values are above, half below this value).
When to prefer median over mean:
When the data contain extreme values/outliers (e.g., very high incomes or housing prices) that can distort the mean.
In income data or housing prices, the median often provides a better sense of central tendency.
Real-world relevance:
Incomes: median income is commonly reported because a few high earners can inflate the mean.
Housing prices: median home value is often reported for similar robustness reasons.
Mode
The mode is the data value that appears most often in the data set.
Important notes:
It is not a measure of central tendency in the same sense as the mean or median; it indicates frequency of a value.
It does not convey information about the distribution beyond the most frequent value.
In practice, the mode is rarely used in research reports compared to the mean or median.
Excel usage and functions (various versions):
Older Excel: MODE function exists but returns only a single mode value.
Newer Excel: MODE.SNGL (single mode) and MODE.MULT (multimodal) exist.
MODE.MULT can spill multiple values if there are multiple modes (bimodal, multimodal).
Practical notes:
If there is only one mode, MODE and MODE.MULT yield the same.
If there are multiple modes, MODE.MULT will list all of them (spills to adjacent cells).
The MODE function alone may report only one mode in some versions.
Caution:
The mode is rarely the best descriptor of a distribution and is not commonly reported in descriptive statistics papers.
Excel interaction tips:
If you don’t know whether data are unimodal or multimodal, MODE.MULT is safer to use because it reveals all modes when present.
Using Excel for Descriptive Statistics: Practical Tips
Function names to memorize:
Mean: AVERAGE
Median: MEDIAN
Mode: MODE.SNGL or MODE.MULT (depending on version)
Data layout strategies:
Put all observations for a single problem in one column for simplicity.
You can place several problems in separate columns and copy/paste formulas to each column.
Relative references adjust automatically when formulas are copied across columns.
Creating reusable templates:
Build a template where the data for a problem goes into a single column (e.g., Column A).
Pre-enter the formulas for mean, median, and mode in adjacent cells that reference that column.
When you paste new data into column A, the results automatically recalculate.
Copying and pasting formulas across problems:
You can copy the entire formula (mean, median, mode) to other columns with similarly structured data.
The relative references will adjust to the new column (e.g., A to B, C, etc.).
Important notes about absolute vs. relative references:
The examples show that you typically don’t need absolute references for this workflow; Excel adjusts when copied.
Absolute references (e.g., $A$1) lock a cell, which is not desirable when applying the same formula to multiple datasets.
Template approach alternative:
If you don’t want to pick data range each time, you can reference the entire column (e.g., A:A) so the formula always includes new data as you add observations.
Example: Rental rates dataset (70 values)
Dataset: 70 rental rates (7 rows × 10 columns in the slide).
Computed statistics (example results):
Mean (average): 590.80 (in dollars)
Median: 575.00 (in dollars)
Observations:
The mean is sensitive to extreme values; the median provides robustness in the presence of outliers.
Excel demonstration notes:
When you expand data in the range used by the function, the result updates automatically if the function references the entire column or a sufficiently broad range.
Blanks are ignored by AVERAGE (and by MEDIAN) in Excel, making it convenient for datasets with missing values.
Percentiles and Quartiles (brief note)
Percentiles indicate the value below which a certain percentage of observations fall.
Quartiles are a type of percentile (Q1, Q2, Q3 correspond to 25th, 50th, and 75th percentiles).
These measures help describe distribution shape and spread beyond central tendency, though they were not deeply covered in this session.
Summary: When to use each measure
Mean (average):
Use when data are roughly symmetric without extreme outliers.
Mathematical convenience and widely reported in many contexts.
Median:
Use when data are skewed or contain outliers (e.g., incomes, housing prices).
Provides a robust measure of central tendency that is not pulled by extreme values.
Mode:
Use to identify the most common value, or to detect multimodal distributions.
Less informative as a sole summary statistic for most reporting purposes.
Quick exam-minded takeaways
Know and be able to write both population and sample mean formulas:
\mu = \frac{\sum{i=1}^{N} Xi}{N}
\bar{X} = \frac{\sum{i=1}^{n} Xi}{n}
Be able to explain the difference between population size (N) and sample size (n).
Understand how to determine the median, including the odd/even case (or rely on Excel to handle it): sorted data, middle value or average of two middle values.
Recognize when the median is preferred over the mean due to outliers and skewness.
Know the concept of the mode and the difference between MODE (single value) and MODE.MULT (multivalue mode) in modern Excel.
Appreciate Excel as a tool for fast, error-reducing computation, especially for large datasets and repetitive tasks.
Remember practical template strategies for reusing formulas across multiple problems/datasets to minimize manual recalculation work.