Descriptive Statistics Notes: Mean, Median, Mode in Excel

Overview: Descriptive statistics in Excel

  • Descriptive statistics describe data for an audience (clients, managers, etc.).

  • Focus in this course: measures of central tendency (location) and basic distribution ideas.

  • We use Excel to compute these instead of manual calculations; Excel has built-in functions for mean, median, mode, percentiles, quartiles, etc.

  • Data context in the transcript: a dataset of rental rates (70 values) used to illustrate how to obtain descriptive statistics.

  • Emphasis on practical use in business/data analytics: part of creating managerial reports and solving firm problems.

  • Percentiles and quartiles are mentioned as additional descriptive stats beyond the three main measures; quartiles are a type of percentile.

  • Excel advantages highlighted: automatic recalculation when data change; ignoring blank cells; simpler workflow for large data.

  • Notation and terminology preview: population vs. sample; population mean (µ) vs. sample mean (x̄); population size (N) vs. sample size (n).

Mean (Average)

  • The mean is the average value of a data set; in statistics, the terms mean and average are interchangeable.

  • Types of means mentioned:

    • Simple (unweighted) mean

    • Weighted mean

    • Exponential averages (briefly noted)

  • Key formulas (population vs. sample):

    • Population mean: \mu = \frac{\sum{i=1}^{N} Xi}{N}

    • Sample mean: \bar{X} = \frac{\sum{i=1}^{n} Xi}{n}

  • Common interpretation: add up all data values and divide by the number of values.

  • Notation distinction:

    • Population parameter uses Greek letters (e.g., µ, N).

    • Sample statistics use simple letters (e.g., \bar{X}, n).

  • How to compute in Excel:

    • Use the AVERAGE function; Excel calls this the mean (not literally the word "mean").

    • Steps: select the data range; Excel returns the mean; you can round as needed (e.g., to 2 decimals).

  • Example setup described:

    • A dataset of 70 rental rates (7 rows × 10 columns).

    • The mean (average) rental rate calculated with AVERAGE in Excel.

    • Excel ignores blank cells when computing AVERAGE.

  • Practical notes about updating:

    • If you add data within the selected range, the mean automatically recalculates (dynamic updating).

    • If you expand data beyond the initially selected cells, you can reference an entire column (or a larger range) to keep formulas updating.

  • Common data organization tip:

    • Typically, one variable per column is standard; you can set up templates to easily reuse formulas for different problems.

  • Notation reminder for exam prep:

    • Population mean: \mu = \frac{\sum{i=1}^{N} Xi}{N}

    • Sample mean: \bar{X} = \frac{\sum{i=1}^{n} Xi}{n}

    • Summation symbol: \sum, data values: X_i, population size: N, sample size: n

  • In the example, the mean value given is 590.80 (in dollars, rounded to two decimals).

Median

  • The median is the middle value of a sorted data set; half of the values fall above and half below.

  • How to determine by hand (conceptually): sort data from lowest to highest.

  • Rules for odd vs. even n (conceptual):

    • If n is odd, the median is the middle value.

    • If n is even, the median is the average of the two middle values.

  • Excel handling: Excel computes the median automatically without you manually sorting or deciding odd/even.

  • Excel usage:

    • Use the MEDIAN function; simply select the data range and Excel returns the median.

  • Example results (70 rental rates):

    • Median = 575.00 (half the values are above, half below this value).

  • When to prefer median over mean:

    • When the data contain extreme values/outliers (e.g., very high incomes or housing prices) that can distort the mean.

    • In income data or housing prices, the median often provides a better sense of central tendency.

  • Real-world relevance:

    • Incomes: median income is commonly reported because a few high earners can inflate the mean.

    • Housing prices: median home value is often reported for similar robustness reasons.

Mode

  • The mode is the data value that appears most often in the data set.

  • Important notes:

    • It is not a measure of central tendency in the same sense as the mean or median; it indicates frequency of a value.

    • It does not convey information about the distribution beyond the most frequent value.

    • In practice, the mode is rarely used in research reports compared to the mean or median.

  • Excel usage and functions (various versions):

    • Older Excel: MODE function exists but returns only a single mode value.

    • Newer Excel: MODE.SNGL (single mode) and MODE.MULT (multimodal) exist.

    • MODE.MULT can spill multiple values if there are multiple modes (bimodal, multimodal).

  • Practical notes:

    • If there is only one mode, MODE and MODE.MULT yield the same.

    • If there are multiple modes, MODE.MULT will list all of them (spills to adjacent cells).

    • The MODE function alone may report only one mode in some versions.

  • Caution:

    • The mode is rarely the best descriptor of a distribution and is not commonly reported in descriptive statistics papers.

  • Excel interaction tips:

    • If you don’t know whether data are unimodal or multimodal, MODE.MULT is safer to use because it reveals all modes when present.

Using Excel for Descriptive Statistics: Practical Tips

  • Function names to memorize:

    • Mean: AVERAGE

    • Median: MEDIAN

    • Mode: MODE.SNGL or MODE.MULT (depending on version)

  • Data layout strategies:

    • Put all observations for a single problem in one column for simplicity.

    • You can place several problems in separate columns and copy/paste formulas to each column.

    • Relative references adjust automatically when formulas are copied across columns.

  • Creating reusable templates:

    • Build a template where the data for a problem goes into a single column (e.g., Column A).

    • Pre-enter the formulas for mean, median, and mode in adjacent cells that reference that column.

    • When you paste new data into column A, the results automatically recalculate.

  • Copying and pasting formulas across problems:

    • You can copy the entire formula (mean, median, mode) to other columns with similarly structured data.

    • The relative references will adjust to the new column (e.g., A to B, C, etc.).

  • Important notes about absolute vs. relative references:

    • The examples show that you typically don’t need absolute references for this workflow; Excel adjusts when copied.

    • Absolute references (e.g., $A$1) lock a cell, which is not desirable when applying the same formula to multiple datasets.

  • Template approach alternative:

    • If you don’t want to pick data range each time, you can reference the entire column (e.g., A:A) so the formula always includes new data as you add observations.

Example: Rental rates dataset (70 values)

  • Dataset: 70 rental rates (7 rows × 10 columns in the slide).

  • Computed statistics (example results):

    • Mean (average): 590.80 (in dollars)

    • Median: 575.00 (in dollars)

  • Observations:

    • The mean is sensitive to extreme values; the median provides robustness in the presence of outliers.

  • Excel demonstration notes:

    • When you expand data in the range used by the function, the result updates automatically if the function references the entire column or a sufficiently broad range.

    • Blanks are ignored by AVERAGE (and by MEDIAN) in Excel, making it convenient for datasets with missing values.

Percentiles and Quartiles (brief note)

  • Percentiles indicate the value below which a certain percentage of observations fall.

  • Quartiles are a type of percentile (Q1, Q2, Q3 correspond to 25th, 50th, and 75th percentiles).

  • These measures help describe distribution shape and spread beyond central tendency, though they were not deeply covered in this session.

Summary: When to use each measure

  • Mean (average):

    • Use when data are roughly symmetric without extreme outliers.

    • Mathematical convenience and widely reported in many contexts.

  • Median:

    • Use when data are skewed or contain outliers (e.g., incomes, housing prices).

    • Provides a robust measure of central tendency that is not pulled by extreme values.

  • Mode:

    • Use to identify the most common value, or to detect multimodal distributions.

    • Less informative as a sole summary statistic for most reporting purposes.

Quick exam-minded takeaways

  • Know and be able to write both population and sample mean formulas:

    • \mu = \frac{\sum{i=1}^{N} Xi}{N}

    • \bar{X} = \frac{\sum{i=1}^{n} Xi}{n}

  • Be able to explain the difference between population size (N) and sample size (n).

  • Understand how to determine the median, including the odd/even case (or rely on Excel to handle it): sorted data, middle value or average of two middle values.

  • Recognize when the median is preferred over the mean due to outliers and skewness.

  • Know the concept of the mode and the difference between MODE (single value) and MODE.MULT (multivalue mode) in modern Excel.

  • Appreciate Excel as a tool for fast, error-reducing computation, especially for large datasets and repetitive tasks.

  • Remember practical template strategies for reusing formulas across multiple problems/datasets to minimize manual recalculation work.