AP Statistics 2.1 Notes

Percentile

  • The p-th percentile of a distribution is the value (xp) below which p percent of observations fall. Formally, for a random variable X with CDF F, it satisfies F(xp) = \frac{p}{100}.
  • To find it: Arrange data numerically, then identify the value where p% of observations are at or below it.
  • Interpretation: Connects to quartiles (p = 25, 50, 75) and describes data location. Graphically, the percentile/CDF plot shows data values on the x-axis and percentiles (up to 100%) on the y-axis, with steeper segments indicating more observations.

Cumulative Relative Frequency / Ogive

  • The cumulative relative frequency (CRF) at a value x is the proportion of observations less than or equal to x: \text{CRF}(x) = \frac{#{X \le x}}{n}.
  • The p-th percentile can be directly read from an Ogive plot as the x-value where CRF(x) = p/100.

Measuring Position: z-Scores

  • Definition: A z-score measures how many standard deviations an observation (X) is from the mean (\mu).
  • Formula: z = \frac{X - \mu}{\sigma}, where \mu is the mean and \sigma is the standard deviation.
  • Interpretation: Positive z means above the mean, negative z means below the mean.
  • Properties:
    • The distribution of z-scores has a mean of \muz = 0 and a standard deviation of \sigmaz = 1.
    • Z-scores standardize data, making datasets comparable on a common scale, while preserving the relative position (shape) of observations.

Linear Transformations

  • For a transformation X' = a + bX:
    • Center/Location (mean, median, quartiles): Shifted by a and scaled by b (e.g., \mu' = a + b\mu).
    • Spread (standard deviation, IQR, range): Unaffected by a, but scaled by |b| (e.g., \sigma' = |b|\sigma).
    • Shape: Preserved, with a possible reflection if b < 0.