Modeling Distributions of Quantitative Data

Chapter 2: Modeling Distributions of Quantitative Data

Introduction to Percentiles and Location in a Distribution

  • The performance of students in tests is better understood by comparing scores rather than looking at absolute values.
    • Example: Emily scores 43 out of 50 on a statistics test. Her satisfaction depends on the relative standing against classmates.
  • Key Concepts Addressed in Section 2.1:
    • Describing location using percentiles.
    • New graphical representation for percentiles (cumulative relative frequency graph).
    • Understanding individual performance based on mean and standard deviation.
  • Density Curves: Provide visual estimations of individual locations within a distribution, particularly with data fitting a Normal distribution pattern.

Activity for Understanding Height Distribution

  1. Teacher marks a floor number line (height scale from 58 to 78 inches).
  2. Class stands according to their height, creating a human dot plot.
  3. Teacher records height distribution and displays for reference.
  4. Class discussion on percentiles of individual heights, mean, and standard deviation computations based on this dot plot:
    • How many students have heights below their own?
  5. Calculate mean and standard deviation for class height, confirming with peers.
  6. Discuss height's position relative to the mean:
    • Standardized scores (z-scores) provide insights into how far above or below the mean a height lies.
  7. Discussion on unit transformation effects (inches to centimeters):
    • There are 2.54 cm in 1 inch, altering shape, center, variability, and measurements like percentiles and z-scores.

Section 2.1: Describing Location in a Distribution

Learning Targets for Section 2.1

  • Locate an individual value within a distribution using percentiles.
  • Use cumulative relative frequency graphs for estimating percentiles.
  • Understand and calculate standardized scores (z-scores).
  • Analyze how adding, subtracting, or scaling data affects distribution characteristics.

Measuring Location: Percentiles

  • Definition of Percentile:
    • The p-th percentile is the value below which a given percentage (p%) of observations fall.
    • For example, if Emily's score (43) is at the 84th percentile, it means 84% of her classmates scored less than or equal to 43.
  • Caution:
    • An observation is said to be at a certain percentile not in it (i.e., Emily is at the 84th percentile, not in it).

Example Calculation of Percentiles

  • Data Set of 25 Test Scores from Mr. Tabor's Class:
    • Scores: 35, 18, 37, 38, 42, 41, 25, 37, 36, 32, 12, 43, 31, 29, 32, 48, 44, 45, 38, 40, 45, 38, 38, 40, 22.
    • Total Students = 25
    • Calculation Example:
    • Jacob's score of 18 → Percentile = (2/25) = 0.08 or 8th percentile.
    • Maria's 48th percentile indicates 48% scored less than her (implies score > 12 students).

Understanding Quartiles

  • The three quartiles divide the data into groups:
    • Q1 separates the lowest 25%, Q2 represents the median (50%), and Q3 separates the lowest 75%.
  • Only in large datasets, the concept of percentiles holds more significance.
  • Caution: A high percentile isn’t inherently positive (e.g., high cholesterol at the 90th percentile).

Cumulative Relative Frequency Graphs

  • These graphs display decimal values representing cumulative percentages.
    • Allows visual identification of individual values' positions and percentiles.
  • Example: The age of U.S. Presidents when inaugurated illustrates age distribution:
    • Frequency Table of ages,
      | Age Range | Frequency | Cumulative Frequency |
      | --------- | --------- | -------------------- |
      | 40-45 | 2 | 2 |
      | 45-50 | 7 | 9 |
      | 50-55 | 13 | 22 |
    • Cumulative relative frequency helps to find percentiles visually in the graph.
  • Definition of Cumulative Relative Frequency Graph: A graph that plots points corresponding to the cumulative percentage of observations.

Example Analysis Using Cumulative Graph

  1. Determine the Address of President Obama:
    • Estimate % cumulative relative frequency for age 47.463.
    • Interpretation shows he was slightly unusual age-wise at inauguration (12th percentile).
  2. Estimation of the 65th Percentile:
    • Approx. corresponds to age 58 by averaging method.

Standardized Scores (z-scores)

  • Definition: A z-score describes how many standard deviations a value is from the mean.
  • Formula:
    z = \frac{value - mean}{standard deviation}
  • Example: Emily scored 43; calculation shows she is:
    z = \frac{43 - 35.44}{8.77} = 0.86
  • Interpretation: Indicates Emily's score is 0.86 standard deviations above the mean.

Data Transformation Effects

  1. Addition/Subtraction:
    • Adding or subtracting constants affects central measures (mean, median) but doesn't change variability.
    • Example: Adding 5 points to all scores shifts measures of center but keeps the same distribution shape.
  2. Multiplication/Division:
    • Changes both central measures and variability proportional to the multiplier/divisor (not shape).
  • Visual representation and calculations confirm these properties, shifting observations while maintaining relative positioning.

Section Summary

  • Percentiles and z-scores are vital for contextual understanding of individual data points.
  • Use cumulative relative frequency graphs for identifying distribution characteristics.
  • Transformation tactics can be employed for uniform data scaling without altering overall distribution shape.

Exercises and Practice

  1. Calculation of z-scores and understanding percentiles in various contexts (e.g., height, scores, income).
  2. Practical applications of cumulative frequency and transformation effects on data distributions across various disciplines.