Descriptive Statistics: Percentiles, Quartiles, Range, IQR, and Variance (Excel)

Percentiles divide a dataset into 100 equal parts; each percentile represents a specific data point.
The kth percentile is the value below which k percent of the data falls. Equivalently, the kth percentile is a value such that at least k percent of the items are
<= that value and at least 100−k percent of the items are >= that value.
Interpretation example: the 25th percentile (P25) means that 25% of the data values are ≤ P25 and 75% are ≥ P25.
When data are sorted from smallest to largest, the percentile indicates the position of a data value in that ordered list.
Common mistake to avoid: percentile is not the percent of questions correct on a test. It is a location of your score within the distribution of all scores.
SAT example from the transcript:
- If a student scores 1000 and has an 80th percentile, it means that 80% of test-takers scored 1000 or less, and 20% scored 1000 or higher.
Percentile values can be any number from 0 to 100 (inclusive).
The percentile value is the data value at that position after sorting (not a percentage).
In practice, you are often asked for a percentile like the 25th, 50th, or 75th, which correspond to quartiles as well (see Quartiles section).

Built-in Excel function: percentile.EXC (uses the same logic as the textbook definition but is easier to use).
Arguments in the dialog:
- Array: the data range (the dataset).
- k: the percentile you are looking for, entered as a decimal between 0 and 1 (not as a percent).
Example inputs:
- For the 40th percentile, enter 0.40 as k.
- For the 35th percentile, enter 0.35 as k.
Important input rule: if you enter 40 (not 0.40), Excel will return an error (e.g., #NUM) because it expects a decimal between 0 and 1.
Practical setup: place the data range in one place, and reference a cell for k so you can reuse the same formula for multiple percentiles by simply changing that k cell.
Tip: put each problem on a separate worksheet/tab to keep work organized and easy to verify later.
Data and k example setup:
- Data range: e.g., A2:A101.
- Percentile cell: e.g., B1 contains 0.40 for the 40th percentile.
- Formula: =PERCENTILE.EXC(A2:A101, B1) (depending on Excel version, you may see =PERCENTILE.EXC or =PERCENTILE.EXC(A2:A101, B1)).
Relation to problem parts: you can reuse the same setup for multiple percentiles by changing the k value (e.g., 0.40, 0.65) without re-typing the data range.
Practical note: Excel sorts the data internally for percentile calculation; you do not need to manually sort the data.

Quartiles partition data into four equal parts; notation:
- Q1: first quartile (lower Q1 = 25th percentile)
- Q2: second quartile (median, 50th percentile)
- Q3: third quartile (75th percentile)
Relationship to percentiles:
- Q1 = P_{0.25}
- Q2 = P_{0.50}
- Q3 = P_{0.75}
In practice, textbooks use q1, q2, q3 notation, but they are computed using the same percentile function with k = 0.25, 0.50, and 0.75 respectively.
Example: to compute q1, use k = 0.25; to compute q2, use k = 0.50; to compute q3, use k = 0.75, all via the same percentile function.

Range: the simplest variability measure; defined as
- $\text{Range} = \max(X) - \min(X)$
Limitations of the range:
- It only uses the extreme values and tells nothing about the values in between, so it’s not a very sophisticated measure of spread.
Excel approach to range:
- No single built-in "range" function; instead use two functions:
- MAX(data) returns the largest value in the dataset.
- MIN(data) returns the smallest value in the dataset.
- Range can be computed in one step as: $\text{Range} = \max(X) - \min(X)$ using a single formula.
Interquartile Range (IQR): a more robust measure of spread that focuses on the middle 50% of the data.
- Definition:
- $\text{IQR} = Q3 - Q1$
- What it tells you: the range of the middle 50% of the data; less sensitive to extreme values (outliers).
- Real-world usage: universities reporting SAT/ACT scores often present Q1 and Q3 to describe the middle 50% of the class.
- Example from transcript: if Q3 = 625 and Q1 = 545, then $\text{IQR} = 625 - 545 = 80$ (in dollars for rental rates in the example).
Practical application:
- The IQR helps assess whether results are influenced by outliers by focusing on the central portion of the data.
- Researchers sometimes examine how results change when extreme values are trimmed (e.g., dropping bottom 5% and top 5%) to see if conclusions hold for the middle 90%.

Variance measures the average squared deviation from the mean and uses all data values.
Population variance (denoted by a Greek letter):
- $\sigma^2 = \frac{1}{N} \sum{i=1}^{N} (xi - \mu)^2$
- Here, $N$ is the population size and $\mu$ is the population mean.
Sample variance (denoted by an $s^2$):
- $s^2 = \frac{1}{n-1} \sum{i=1}^{n} (xi - \bar{x})^2$
- Here, $n$ is the sample size and $\bar{x}$ is the sample mean.
Key difference between the two formulas:
- Numerator is the same: sum of squared deviations $\sum (x_i - \text{mean})^2$.
- Denominator differs: population uses $N$, while sample uses $n-1$ (the degrees of freedom adjustment).
Practical interpretation issues:
- Variance uses squared units; if data are dollars, variance is in dollars$^2$, which can be hard to interpret.
Excel usage (conceptual):
- There are two built-in variance functions corresponding to population and sample data (one for each denominator choice).
- In practice, you need to identify whether your data represent a population or a sample to choose the correct function.
Note on the course plan:
- The instructor indicates that the next session will cover the Excel variance functions in detail.
Important reminders from the transcript:
- Always check whether your problem uses population data or sample data before computing variance.
- If unspecified, the problem typically uses sample data.

Percentile inputs must be decimal (0 to 1) in Excel; using a whole percent like 40 will produce an error.
When solving multiple percentile questions, link k to a cell and reference that cell in the percentile formula to reuse the same setup for different percentiles.
For organization, keep each problem on a separate worksheet/tab to easily verify results and track mistakes.
Remember the equivalences:
- Q1 = P_{0.25}
- Q2 = P_{0.50}
- Q3 = P_{0.75}
Recap of core formulas:
- Percentile: $Pk = \text{value such that at least } k\% \text{ of data are } \le Pk \text{ and at least } (100-k)\% \text{ are } \ge P_k.$
- Range: $\text{Range} = \max(X) - \min(X)$
- IQR: $\text{IQR} = Q3 - Q1$
- Population variance: $\sigma^2 = \frac{1}{N} \sum{i=1}^{N} (xi - \mu)^2$
- Sample variance: $s^2 = \frac{1}{n-1} \sum{i=1}^{n} (xi - \bar{x})^2$
These notes summarize the key ideas and steps described in the transcript and set up practical Excel workflows for percentile, quartile, range, IQR, and variance calculations.