Descriptive Statistics: Percentiles, Quartiles, Range, IQR, and Variance (Excel)
Percentiles and Their Interpretation
Percentiles divide a dataset into 100 equal parts; each percentile represents a specific data point.
The kth percentile is the value below which k percent of the data falls. Equivalently, the kth percentile is a value such that at least k percent of the items are
<= that value and at least 100−k percent of the items are >= that value.Interpretation example: the 25th percentile (P25) means that 25% of the data values are ≤ P25 and 75% are ≥ P25.
When data are sorted from smallest to largest, the percentile indicates the position of a data value in that ordered list.
Common mistake to avoid: percentile is not the percent of questions correct on a test. It is a location of your score within the distribution of all scores.
SAT example from the transcript:
If a student scores 1000 and has an 80th percentile, it means that 80% of test-takers scored 1000 or less, and 20% scored 1000 or higher.
Percentile values can be any number from 0 to 100 (inclusive).
The percentile value is the data value at that position after sorting (not a percentage).
In practice, you are often asked for a percentile like the 25th, 50th, or 75th, which correspond to quartiles as well (see Quartiles section).
Using Excel to Compute Percentiles
Built-in Excel function: percentile.EXC (uses the same logic as the textbook definition but is easier to use).
Arguments in the dialog:
Array: the data range (the dataset).
k: the percentile you are looking for, entered as a decimal between 0 and 1 (not as a percent).
Example inputs:
For the 40th percentile, enter 0.40 as k.
For the 35th percentile, enter 0.35 as k.
Important input rule: if you enter 40 (not 0.40), Excel will return an error (e.g., #NUM) because it expects a decimal between 0 and 1.
Practical setup: place the data range in one place, and reference a cell for k so you can reuse the same formula for multiple percentiles by simply changing that k cell.
Tip: put each problem on a separate worksheet/tab to keep work organized and easy to verify later.
Data and k example setup:
Data range: e.g., A2:A101.
Percentile cell: e.g., B1 contains 0.40 for the 40th percentile.
Formula: =PERCENTILE.EXC(A2:A101, B1) (depending on Excel version, you may see =PERCENTILE.EXC or =PERCENTILE.EXC(A2:A101, B1)).
Relation to problem parts: you can reuse the same setup for multiple percentiles by changing the k value (e.g., 0.40, 0.65) without re-typing the data range.
Practical note: Excel sorts the data internally for percentile calculation; you do not need to manually sort the data.
Quartiles and Their Relationship to Percentiles
Quartiles partition data into four equal parts; notation:
Q1: first quartile (lower Q1 = 25th percentile)
Q2: second quartile (median, 50th percentile)
Q3: third quartile (75th percentile)
Relationship to percentiles:
Q1 = P_{0.25}
Q2 = P_{0.50}
Q3 = P_{0.75}
In practice, textbooks use q1, q2, q3 notation, but they are computed using the same percentile function with k = 0.25, 0.50, and 0.75 respectively.
Example: to compute q1, use k = 0.25; to compute q2, use k = 0.50; to compute q3, use k = 0.75, all via the same percentile function.
Measures of Variability: Range and Interquartile Range (IQR)
Range: the simplest variability measure; defined as
Limitations of the range:
It only uses the extreme values and tells nothing about the values in between, so it’s not a very sophisticated measure of spread.
Excel approach to range:
No single built-in "range" function; instead use two functions:
MAX(data) returns the largest value in the dataset.
MIN(data) returns the smallest value in the dataset.
Range can be computed in one step as: using a single formula.
Interquartile Range (IQR): a more robust measure of spread that focuses on the middle 50% of the data.
Definition:
What it tells you: the range of the middle 50% of the data; less sensitive to extreme values (outliers).
Real-world usage: universities reporting SAT/ACT scores often present Q1 and Q3 to describe the middle 50% of the class.
Example from transcript: if Q3 = 625 and Q1 = 545, then (in dollars for rental rates in the example).
Practical application:
The IQR helps assess whether results are influenced by outliers by focusing on the central portion of the data.
Researchers sometimes examine how results change when extreme values are trimmed (e.g., dropping bottom 5% and top 5%) to see if conclusions hold for the middle 90%.
Variance: Population vs Sample
Variance measures the average squared deviation from the mean and uses all data values.
Population variance (denoted by a Greek letter):
Here, $N$ is the population size and $\mu$ is the population mean.
Sample variance (denoted by an $s^2$):
Here, $n$ is the sample size and $\bar{x}$ is the sample mean.
Key difference between the two formulas:
Numerator is the same: sum of squared deviations $\sum (x_i - \text{mean})^2$.
Denominator differs: population uses $N$, while sample uses $n-1$ (the degrees of freedom adjustment).
Practical interpretation issues:
Variance uses squared units; if data are dollars, variance is in dollars$^2$, which can be hard to interpret.
Excel usage (conceptual):
There are two built-in variance functions corresponding to population and sample data (one for each denominator choice).
In practice, you need to identify whether your data represent a population or a sample to choose the correct function.
Note on the course plan:
The instructor indicates that the next session will cover the Excel variance functions in detail.
Important reminders from the transcript:
Always check whether your problem uses population data or sample data before computing variance.
If unspecified, the problem typically uses sample data.
Quick references and implementation tips
Percentile inputs must be decimal (0 to 1) in Excel; using a whole percent like 40 will produce an error.
When solving multiple percentile questions, link k to a cell and reference that cell in the percentile formula to reuse the same setup for different percentiles.
For organization, keep each problem on a separate worksheet/tab to easily verify results and track mistakes.
Remember the equivalences:
Q1 = P_{0.25}
Q2 = P_{0.50}
Q3 = P_{0.75}
Recap of core formulas:
Percentile:
Range:
IQR:
Population variance:
Sample variance:
These notes summarize the key ideas and steps described in the transcript and set up practical Excel workflows for percentile, quartile, range, IQR, and variance calculations.