Calculating Percentiles in Normal Distributions

Understanding Percentiles and Their Calculation

Introduction to Percentiles

  • Definition: A percentile indicates the value below which a given percentage of observations in a group of observations falls. It typically describes how many people or values are smaller or less than a specific value, not how many are greater.

    • For example, if you are in the 80^{th} percentile, 80\% of people are shorter than you.

    • While one could say you are in the top 20\% (for the 80^{th} percentile), the standard definition of percentile focuses on the proportion of values less than yours.

  • Relation to Probability: Percentiles are closely related to probability, as they represent the likelihood of a value falling below a certain point in a distribution.

Percentiles for Discrete Data

  • Easier to Calculate: Finding percentiles is simpler for discrete data, which can be clearly categorized.

    • Example: Commute time to work in hours (2 hours, 3 hours, 4 hours, 5 hours).

  • Method: A frequency table can be used.

    • P

      • Frequency Count: The number of people in each group.

      • Relative Frequency: The percentage of people in each group.

      • Cumulative Relative Frequency: The sum of relative frequencies as you move down the table, representing the percentage of people at or below a certain category.

  • Imperfect Example: For a commute time of 3.5 hours, if 65\% of people take 3 hours or less, one would be at least at the 65^{th} percentile. This illustrates that for ordered (ordinal) or clearly discrete datasets, counting values smaller than yours is straightforward.

Percentiles for Continuous Data

  • More Complex: Calculating percentiles becomes more complicated with continuous values (e.g., exact time values, exam scores).

    • Example: "How high does a score need to be to be in the 90^{th} percentile?"

  • Rephrasing the Question: This type of question can be rephrased in terms of probability for a standard normal distribution:

    • "At what value of y is the statement P(Z < y) = 0.90 true?"

    • This means finding the Z-score (y) such that 90\% of the area under the standard normal curve (probability) is to the left of that Z-score.

  • Tools to Answer: We can use established tools:

    • Standardization: Transforming any normal distribution into a standard normal distribution using the Z-score formula: Z = \frac{(X - \mu)}{\sigma}, where Z is the Z-score, X is the raw score, \mu is the mean, and \sigma is the standard deviation.

    • Probability Tables: Using standard normal (Z-score) tables to find probabilities associated with Z-scores.

  • Reverse Process: Instead of finding probability from a Z-score, we reverse the process:

    1. Start with the desired probability (percentile).

    2. Find the corresponding Z-score from the standard normal table.

    3. Use the Z-score equation to solve for the raw score (X).

Example 1: SAT Verbal Score at the 90^{th} Percentile

  • Scenario: What is the SAT verbal score at the 90^{th} percentile?

    • Mean \mu = 500

    • Standard Deviation \sigma = 100

    • This is visually represented as finding the score (X) or Z-value (Z) where 90\% of the area under the curve is less than that value.

  • Step 1: Find the Z-score for the 90^{th} percentile.

    • Locate the probability \approx 0.9000 in the middle of the standard normal table.

    • Table Reading: The values in the middle of the table are probabilities. The first column and first row represent the Z-score components.

    • Finding 0.9000 (or the closest values, e.g., 0.8997 or 0.9015) corresponds to a Z-score derived from combining values from the left column (e.g., 1.2) and the top row (e.g., 0.09).

    • Result: The corresponding Z-score is Z = 1.29. This means 90\% of the probability is less than a Z-score of 1.29.

  • Step 2: Plug the Z-score into the Z-score equation and solve for X.

    • Equation: Z = \frac{(X - \mu)}{\sigma}

    • Substitute known values: 1.29 = \frac{(X - 500)}{100}

    • Rearrange and solve for X (algebra):

      • 1.29 \times 100 = X - 500

      • 129 = X - 500

      • X = 129 + 500

      • X = 629

    • Conclusion: An SAT verbal score of 629 is at the 90^{th} percentile (higher than 90\% of other scores).

Steps to Finding Percentiles (Summary)

  1. Use the percentile (probability) to find the corresponding Z-score on your standard normal table.

  2. Plug that Z-score, the mean (\mu), and the standard deviation (\sigma) into your Z-score equation: Z = \frac{(X - \mu)}{\sigma}.

  3. Rearrange the equation to solve for the actual value (X) that represents that percentile.

Example 2: SAT Verbal Score at the 20^{th} Percentile

  • Scenario: What is the SAT verbal score at the 20^{th} percentile?

    • Mean \mu = 500

    • Standard Deviation \sigma = 100

  • Step 1: Find the Z-score for the 20^{th} percentile.

    • Since 20\% is less than 50\% (the mean/center), we will be looking at the negative side of the Z-score table.

    • Locate the probability \approx 0.2000 in the middle of the negative Z-score table.

    • Result: The corresponding Z-score is Z = -0.84. This indicates that 20\% of the information is less than this Z-score, and 80\% is greater.

  • Step 2: Plug the Z-score into the Z-score equation and solve for X.

    • Equation: Z = \frac{(X - \mu)}{\sigma}

    • Substitute known values: -0.84 = \frac{(X - 500)}{100}

    • Rearrange and solve for X (algebra):

      • -0.84 \times 100 = X - 500

      • -84 = X - 500

      • X = -84 + 500

      • X = 416

    • Conclusion: An SAT verbal score of 416 is at the 20^{th} percentile.

Tips and Tricks for Z-scores and Percentiles (Sanity Checks)

  • Normal Distributions Are Symmetrical: This is a convenient property.

    • The negative Z-score for the P^{th} percentile will be the same magnitude as the positive Z-score for the (100 - P)^{th} percentile.

    • Example: The Z-score for the 20^{th} percentile (-0.84) is the negative equivalent of the Z-score for the 80^{th} percentile (+0.84). This means the area below -0.84 (20\%) is equal to the area above +0.84 (20\%).

    • You can use this to work with just the positive or negative Z-table if preferred, by finding the 'mirror' percentile.

  • Mean as Center (Median, Mode):

    • A Z-score of 0 is at the center of the graph, representing the mean and the 50^{th} percentile.

    • If you're finding a percentile less than 50^{th} (e.g., 20^{th} percentile), your calculated raw score (X) must be less than the mean (\mu), and your Z-score must be negative.

    • If you're finding a percentile greater than 50^{th} (e.g., 90^{th} percentile), your calculated raw score (X) must be greater than the mean (\mu), and your Z-score must be positive.

    • These checks help ensure you are looking in the correct general region of the distribution.

Outliers on a Normal Distribution

  • Definition: An outlier is an extreme value that is highly unlikely or atypical of the distribution, though it could still be a real observation.

  • Z-score Table Limits: Standard normal tables typically only extend to a certain range (e.g., Z-scores from -3.99 to +3.99).

  • Extreme Cases: A Z-score greater than +4 or less than -4 is possible but extremely rare. Absence from the table does not mean these values don't exist; it simply means their probability of occurrence is very, very low. This is important to remember for future assignments where such extreme Z-scores might appear.