Z-tables and percentile calculations

Acknowledgement and context

  • Acknowledgement of country at the start: paying respect to traditional owners and connections to country as a foundation for education and research in Australia and globally.

Week-to-week structure and key build-up

  • Recap: last week focused on distributions, their characterisation, and the idea that many real-world variables form distributions when measured (shape, central tendency, spread).

  • Normal distribution (Gaussian) emerges when distributions are symmetric around their central tendency (mean).

  • This week builds on standard deviations to introduce z-scores (normal scores in units of standard deviations).

  • Next week: correlations, which use z-scores in their calculation; correlations are central in research and underpin upcoming assignment.

  • Core theme: many statistical methods rely on the assumption that data follow a normal distribution, enabling a broad, generalizable set of procedures.

Visualizing and interpreting data distributions

  • Tables of numbers are not enough to understand data; we organize data with frequency distributions to capture shape, center, and spread.

  • Heights example (American students): distribution shows a central tendency (mean height) and spread; tails are rare compared to center; males and females show slightly different distributions.

  • Centred around mean; for a normal distribution, mean
    \approx
    median
    \approx
    mode.

  • With larger data samples, the distribution more closely resembles the idealized bell curve (normal distribution); with smaller samples, more wiggles appear.

  • Purpose of the idealized curve: enables mathematical descriptions and derivations, allowing robust statistical procedures without needing calculus for every variable.

The normal distribution and its features

  • A mathematical construction: probability density is symmetric around the mean; tails never reach zero but approach it asymptotically.

  • The area under the curve equals 100% of scores; mean, median, and mode align at the center.

  • The normal distribution is a “family” with parameters: mean (center) and standard deviation (spread). Different means and spreads still follow the same general shape when standardized.

  • Central Limit Theorem (brief): means of large samples are normally distributed around the true mean, which underpins the use of parametric tests (e.g., t-tests) that assume normality; tests are robust to modest violations of normality with large samples.

Standard deviation and unit conversion intuition

  • Standard deviation measures spread; it is the average distance of scores from the mean in original units.

  • Calculation intuition: for a score x, deviation from the mean is
    (xμ)(x - \mu)
    ; standard deviation
    σ\sigma
    (or s for sample) normalizes this deviation into standard deviation units.

  • Analogy: converting height between cm and inches is a unit change; the underlying value doesn’t change, only how it is labeled. The same applies to converting to z-scores: values stay in the same relative positions, just rescaled.

  • Example conversion: 1 inch = 2.54 cm; height remains same when converted between units; similarly, a z-score transformation does not change the distribution, only the units.

  • Key takeaway: z-scores are standard scores—scores expressed in units of standard deviations from the mean.

Z-scores: definition, notation, and purpose

  • A z-score tells you how many standard deviations a score is from the mean.

  • Positive z: above the mean; negative z: below the mean.

  • Notation variances:

    • For a sample: mean is often denoted as
      mm
      or sometimes
      xˉ\bar{x}
      , standard deviation as
      ss
      .

    • For a population: mean is
      μ\mu
      , standard deviation as
      σ\sigma
      .

  • Z-score formula (standardization):
    z=xμσz = \frac{x - \mu}{\sigma}

  • In the sample context, you can also write with sample mean and SD:
    z=xxˉsz = \frac{x - \bar{x}}{s}

  • Inverse transformation (back to original units):
    x=μ+zσx = \mu + z\sigma

  • Practical use: turning different measures (apples and oranges) into a common scale to compare across distributions; and to estimate how common a raw score is within its distribution.

Worked intuition: what z-scores do across distributions

  • Three distributions with the same mean but different standard deviations illustrate equal z-score positions:

    • Distribution A: mean 100, SD 10, score 110
      \rightarrow
      z=(110100)/10=1z = (110 - 100)/10 = 1
      .

    • Distribution B: mean 100, SD 15, score 110
      \rightarrow
      z=(110100)/150.67z = (110 - 100)/15 \approx 0.67
      .

    • Distribution C: mean 100, SD 25, score 110
      \rightarrow
      z=(110100)/25=0.4z = (110 - 100)/25 = 0.4
      .

  • Same raw score (110) can be equally unusual relative to different distributions; the z-score captures that relative position.

  • If we keep the raw score fixed but increase the spread, the z-score decreases, meaning the score is less exceptional within a wider distribution, even though the raw value is larger in absolute terms.

  • This underpins cross-domain comparisons (e.g., calculus vs. a lighter subject): a high raw score in a tough distribution can still be more impressive when expressed as a z-score relative to its peers.

  • Because z-scores standardize, relationships across different measures become interpretable within the same standard-deviation-based scale.

Practical examples: comparing performances across subjects and sports

  • Marissa’s two exams (music theory vs music practice):

    • Music theory: mean 50, SD 10, score 65
      \rightarrow
      z=(6550)/10=1.5z = (65 - 50)/10 = 1.5
      .

    • Music practice: mean 60, SD 15, score 75
      \rightarrow
      z=(7560)/15=1.0z = (75 - 60)/15 = 1.0
      .

    • Conclusion: Marissa scored relatively higher (more above the mean) in music theory; thus she did better in theory when comparing within their respective distributions.

  • Don Bradman vs Ted Williams (cross-sport comparison):

    • Bradman batting average mean 27.49, SD 14; score 99.94
      \rightarrow
      zBradman=(99.9427.49)/145.3z_{\text{Bradman}} = (99.94 - 27.49)/14 \approx 5.3
      .

    • Williams batting average mean 0.284, SD 0.014 (or close to that value in the example) with score 0.406
      \rightarrow
      zWilliams(0.4060.284)/0.01489z_{\text{Williams}} \approx (0.406 - 0.284)/0.014 \approx 8-9
      (note: the transcript contains numbers that illustrate the idea but exact SDs may vary; the key point is that Williams’ z-score represents a very high relative standing within his sport, Bradman also very high; the comparison shows cross-domain differences can be made interpretable via z-scores).

    • Conclusion: When compared to their peers within their own distributions, Bradman and Williams can be ranked in terms of how far above the mean their performances were; z-scores enable a meaningful cross-domain comparison.

  • Einstein IQ example:

    • IQ distribution: mean 100, SD 15; Einstein’s IQ
      \approx
      180
      \rightarrow
      z=180100155.33z = \frac{180 - 100}{15} \approx 5.33
      .

    • Interpretation: extremely rare, highlighting the utility of z-scores for assessing rarity within a distribution.

  • Summary takeaway: z-scores let us quantify how unusual a score is within its distribution and allow direct comparison across different measures with different scales.

Converting back and forth and reverse problems

  • Given a z-score, compute the original score:
    x=μ+zσx = \mu + z\sigma
    (population) or
    x=xˉ+zsx = \bar{x} + z s
    (sample).

  • Example: from a z = 1.5, mean
    μ\mu
    = 55, SD
    σ\sigma
    = 3
    \rightarrow
    x=55+1.5×3=59.5x = 55 + 1.5\times3 = 59.5
    .

  • Conversely, given a raw score, compute z:
    z=(xμ)/σz = (x - \mu)/\sigma
    .

  • If you want a target percentile, you can reverse-engineer the raw score from a percentile using the z-table and the same mean/SD parameters.

  • Example procedure (IQ scenario used in lecture):

    • Convert target percentile to z via the z-table; then solve for x:
      x=μ+zσx = \mu + z\sigma
      .

  • If you want to be a certain z-score above the mean in multiple classes with different means/SDs, compute each class’s required raw score with the appropriate
    μ\mu
    and
    σ\sigma
    .

Z-tables, percentiles, and p-values: how to read and use them

  • Z-table structure (typical):

    • Left column: z-score values (0, 0.01, 0.02, …).

    • Middle: percentage of scores between the mean and that z-score (area from the mean to z).

    • Right: percentage of scores beyond that z-score (the tail beyond z).

  • Example: z = 1.00 corresponds to 34.13% of data between the mean and +1 SD and 15.87% beyond +1 SD (since 50% beyond the mean is the total on one side). These two pieces add to 50% for one side; doubled, they describe the full two-sided distribution.

  • Important shorthand values: common z-scores include z = 1.96 (corresponds to the two-tailed p = 0.05), z = 1.65 (approx. p = 0.05 one-tailed), etc.

  • Cumulative approach: to find percentile rank (area below a given score), compute 50% (below the mean) + area between the mean and the z-score (from the table).

  • Percentiles and p-values are closely related. A percentile indicates the percentage of scores that fall below a given z-score. A p-value typically represents the probability of observing a score as extreme as, or more extreme than, a given z-score, often corresponding to the area in one or both tails of the distribution. Understanding these relationships is crucial for hypothesis testing and statistical inference.