Calculating Percentiles in Normal Distributions
Understanding Percentiles and Their Calculation
Introduction to Percentiles
Definition: A percentile indicates the value below which a given percentage of observations in a group of observations falls. It typically describes how many people or values are smaller or less than a specific value, not how many are greater.
For example, if you are in the 80^{th} percentile, 80\% of people are shorter than you.
While one could say you are in the top 20\% (for the 80^{th} percentile), the standard definition of percentile focuses on the proportion of values less than yours.
Relation to Probability: Percentiles are closely related to probability, as they represent the likelihood of a value falling below a certain point in a distribution.
Percentiles for Discrete Data
Easier to Calculate: Finding percentiles is simpler for discrete data, which can be clearly categorized.
Example: Commute time to work in hours (2 hours, 3 hours, 4 hours, 5 hours).
Method: A frequency table can be used.
P
Frequency Count: The number of people in each group.
Relative Frequency: The percentage of people in each group.
Cumulative Relative Frequency: The sum of relative frequencies as you move down the table, representing the percentage of people at or below a certain category.
Imperfect Example: For a commute time of 3.5 hours, if 65\% of people take 3 hours or less, one would be at least at the 65^{th} percentile. This illustrates that for ordered (ordinal) or clearly discrete datasets, counting values smaller than yours is straightforward.
Percentiles for Continuous Data
More Complex: Calculating percentiles becomes more complicated with continuous values (e.g., exact time values, exam scores).
Example: "How high does a score need to be to be in the 90^{th} percentile?"
Rephrasing the Question: This type of question can be rephrased in terms of probability for a standard normal distribution:
"At what value of y is the statement P(Z < y) = 0.90 true?"
This means finding the Z-score (y) such that 90\% of the area under the standard normal curve (probability) is to the left of that Z-score.
Tools to Answer: We can use established tools:
Standardization: Transforming any normal distribution into a standard normal distribution using the Z-score formula: Z = \frac{(X - \mu)}{\sigma}, where Z is the Z-score, X is the raw score, \mu is the mean, and \sigma is the standard deviation.
Probability Tables: Using standard normal (Z-score) tables to find probabilities associated with Z-scores.
Reverse Process: Instead of finding probability from a Z-score, we reverse the process:
Start with the desired probability (percentile).
Find the corresponding Z-score from the standard normal table.
Use the Z-score equation to solve for the raw score (X).
Example 1: SAT Verbal Score at the 90^{th} Percentile
Scenario: What is the SAT verbal score at the 90^{th} percentile?
Mean \mu = 500
Standard Deviation \sigma = 100
This is visually represented as finding the score (X) or Z-value (Z) where 90\% of the area under the curve is less than that value.
Step 1: Find the Z-score for the 90^{th} percentile.
Locate the probability \approx 0.9000 in the middle of the standard normal table.
Table Reading: The values in the middle of the table are probabilities. The first column and first row represent the Z-score components.
Finding 0.9000 (or the closest values, e.g., 0.8997 or 0.9015) corresponds to a Z-score derived from combining values from the left column (e.g., 1.2) and the top row (e.g., 0.09).
Result: The corresponding Z-score is Z = 1.29. This means 90\% of the probability is less than a Z-score of 1.29.
Step 2: Plug the Z-score into the Z-score equation and solve for X.
Equation: Z = \frac{(X - \mu)}{\sigma}
Substitute known values: 1.29 = \frac{(X - 500)}{100}
Rearrange and solve for X (algebra):
1.29 \times 100 = X - 500
129 = X - 500
X = 129 + 500
X = 629
Conclusion: An SAT verbal score of 629 is at the 90^{th} percentile (higher than 90\% of other scores).
Steps to Finding Percentiles (Summary)
Use the percentile (probability) to find the corresponding Z-score on your standard normal table.
Plug that Z-score, the mean (\mu), and the standard deviation (\sigma) into your Z-score equation: Z = \frac{(X - \mu)}{\sigma}.
Rearrange the equation to solve for the actual value (X) that represents that percentile.
Example 2: SAT Verbal Score at the 20^{th} Percentile
Scenario: What is the SAT verbal score at the 20^{th} percentile?
Mean \mu = 500
Standard Deviation \sigma = 100
Step 1: Find the Z-score for the 20^{th} percentile.
Since 20\% is less than 50\% (the mean/center), we will be looking at the negative side of the Z-score table.
Locate the probability \approx 0.2000 in the middle of the negative Z-score table.
Result: The corresponding Z-score is Z = -0.84. This indicates that 20\% of the information is less than this Z-score, and 80\% is greater.
Step 2: Plug the Z-score into the Z-score equation and solve for X.
Equation: Z = \frac{(X - \mu)}{\sigma}
Substitute known values: -0.84 = \frac{(X - 500)}{100}
Rearrange and solve for X (algebra):
-0.84 \times 100 = X - 500
-84 = X - 500
X = -84 + 500
X = 416
Conclusion: An SAT verbal score of 416 is at the 20^{th} percentile.
Tips and Tricks for Z-scores and Percentiles (Sanity Checks)
Normal Distributions Are Symmetrical: This is a convenient property.
The negative Z-score for the P^{th} percentile will be the same magnitude as the positive Z-score for the (100 - P)^{th} percentile.
Example: The Z-score for the 20^{th} percentile (-0.84) is the negative equivalent of the Z-score for the 80^{th} percentile (+0.84). This means the area below -0.84 (20\%) is equal to the area above +0.84 (20\%).
You can use this to work with just the positive or negative Z-table if preferred, by finding the 'mirror' percentile.
Mean as Center (Median, Mode):
A Z-score of 0 is at the center of the graph, representing the mean and the 50^{th} percentile.
If you're finding a percentile less than 50^{th} (e.g., 20^{th} percentile), your calculated raw score (X) must be less than the mean (\mu), and your Z-score must be negative.
If you're finding a percentile greater than 50^{th} (e.g., 90^{th} percentile), your calculated raw score (X) must be greater than the mean (\mu), and your Z-score must be positive.
These checks help ensure you are looking in the correct general region of the distribution.
Outliers on a Normal Distribution
Definition: An outlier is an extreme value that is highly unlikely or atypical of the distribution, though it could still be a real observation.
Z-score Table Limits: Standard normal tables typically only extend to a certain range (e.g., Z-scores from -3.99 to +3.99).
Extreme Cases: A Z-score greater than +4 or less than -4 is possible but extremely rare. Absence from the table does not mean these values don't exist; it simply means their probability of occurrence is very, very low. This is important to remember for future assignments where such extreme Z-scores might appear.