Z scores and standardization — study notes

What is a z score?

  • A z score provides a concise way to describe exactly where an individual score falls within its distribution.

  • It represents the number of standard deviations a score is above or below the mean of that distribution.

  • A z score is a standardized score: it converts the original unit of measurement (e.g., exam points, temperature) into units of standard deviations.

  • Intuition: knowing a score (x) and the mean and spread (SD) of the distribution lets you say how unusual or typical that score is.

  • If distributions are roughly normal, z scores help compare scores across different distributions.

Why we care about z scores

  • With just a raw score, you often don’t know how extreme it is without the distribution’s mean and spread.

  • Two students could have the same raw score (e.g., 78) in different distributions with the same mean but different spreads; z scores reveal relative standing.

  • A small standard deviation means scores are tightly clustered around the mean, so a given raw score is more extreme; a large SD means scores are more spread out, so the same raw score is less extreme.

  • Z scores summarize three pieces of information in one number: location (relative to mean) and spread (in SD units).

Notation: samples vs populations

  • Statisticians distinguish between describing a sample (descriptive statistics) and describing a population (inferential statistics).

  • Sample (descriptive): represented with English/ Roman letters (e.g., X, \bar{X}, s) – describe the data you actually collected.

  • Population (inferential): represented with Greek letters (e.g., \mu, \sigma, \sigma^2) – describe the full population when known.

  • A population parameter (e.g., population mean \mu, population SD \sigma) is often unknown and estimated from sample data.

  • Common sample statistics:

    • Mean: (\overline{X}) (often written as M in some contexts)

    • Variance: (s^2)

    • Standard deviation: (s)

  • Common population parameters:

    • Mean: (\mu) (mu pronounced like "mew")

    • Variance: (\sigma^2)

    • Standard deviation: (\sigma)

  • Pronunciation note: (\mu) is pronounced mu, not like the cow sound "moo".

  • One-to-one mapping: mean ↔ (\mu), variance ↔ (\sigma^2), standard deviation ↔ (\sigma).

Formulae: how to compute z scores

  • For a score within a sample (descriptive z score):

    • z=xXsz = \frac{x - \overline{X}}{s}

  • For a score within a population (inferential z score):

    • z=xμσz = \frac{x - \mu}{\sigma}

  • Converting back from a z score to a raw score:

    • Within a population: x=zσ+μx = z\,\sigma + \mu

    • Within a sample: x=zs+Xx = z\,s + \overline{X}

  • Quick example (population): mean 100, SD 10, raw score 120:

    • z=12010010=2z = \frac{120 - 100}{10} = 2

  • Another example (population): mean 100, SD 10, raw score 85:

    • z=8510010=1.5z = \frac{85 - 100}{10} = -1.5

  • Another example (sample): mean 4 cups/day, SD 1.5, raw score 6 cups:

    • z=641.51.33z = \frac{6 - 4}{1.5} \approx 1.33

  • Inverse example (to raw from z): z = -2, mean 4, SD 1.5:

    • x=(2)(1.5)+4=1x = (-2)(1.5) + 4 = 1

Worked practice problems from the transcript

  • Practice 1 (population parameters):

    • Population mean (\mu = 100), population SD (\sigma = 10), raw score (x = 120):

    • z=12010010=2z = \frac{120 - 100}{10} = 2

  • Practice 2 (population):

    • Mean (\mu = 100), SD (\sigma = 10), raw score (x = 85):

    • z=8510010=1.5z = \frac{85 - 100}{10} = -1.5

  • Practice 3 (coffee):

    • Mean daily cups = 4, SD = 1.5, raw score = 6:

    • z=641.5=21.51.33z = \frac{6 - 4}{1.5} = \frac{2}{1.5} \approx 1.33

  • Practice 4 (inverse):

    • If z = -2, mean = 4, SD = 1.5:

    • x=(2)(1.5)+4=1x = (-2)(1.5) + 4 = 1

  • Conceptual practice: interpreting a z score rather than calculating a number

    • A z score of -2.5 (e.g., rainfall) means the value lies 2.5 SDs below the mean, i.e., in the far left tail.

    • In a roughly symmetric normal distribution, moving out to -2.5 SD places you in the far tail, indicating well-below-average value.

Unit conversion intuition: z scores as unit conversions

  • Conceptual idea: converting original units to standard deviation units is just unit conversion.

  • Examples from the transcript (illustrative):

    • 27 feet ≈ 9 yards (unit conversion)

    • 72°F ≈ 22.22°C (unit conversion; note Fahrenheit to Celsius conversion)

    • 350 liters ≈ 1.43 hogsheads (hogshead is a unit of beer volume)

    • 128 cubic feet of firewood = 1 cord (or also 3 ricks of firewood)

  • Takeaway: z scores are simply converting the measurement to the number of standard deviations away from the mean.

Area under the normal curve and z-score interpretation

  • Core idea: in a standard normal curve (mean 0, SD 1), you can estimate percentages of scores between or beyond z scores using known segments:

    • From mean to +1 SD: ~34%

    • From +1 SD to +2 SD: ~14%

    • Above +2 SD: ~2%

    • By symmetry, below -1 SD: ~34%; between -1 and -2: ~14%; below -2: ~2%

  • Estimating percent below a positive z (e.g., z = +1):

    • Draw the standard normal curve, mark z = +1, shade below it.

    • Area from 0 to +1 is ~34%; area below mean is 50%; total below +1 is ~84%

  • Estimating percent above a negative z (e.g., z = -1.5):

    • Shade above z = -1.5; split into regions: above 0 is 50%, 0 to -1 is 34%, left of -1 to -2 is 14%, left of -2 is 2% (estimate the small left tail between -1 and -2 as ~8% in the transcript, noting estimation).

    • Sum: 50% + 34% + ~8% ≈ 92%

  • Estimating percent below a negative z (e.g., z = -1.8):

    • Area below -1.8 ≈ 2% (below -2) + a small piece between -2 and -1.8 (estimated ~1–2%) ≈ 3–4%

  • Estimating percent above a small positive z (e.g., z = +0.6):

    • Area above +0.6 ≈ small tail above 0.6, combine known segments: between 0 and +1 is 34%, above 2 is 2%, between 0 and +0.6 is part of the 0 to +1 segment; the transcript estimates about 26% above +0.6 using a breakdown of 10% (between 0.6 and 1) + 14% (between 1 and 2) + 2% (above 2).

  • Practical note: these are estimates to emphasize understanding; exact values can be obtained with tools.

Exact percentages with calculators and software

  • Excel and other tools can compute exact percentages below/above a given z score.

  • In Excel (standard normal):

    • Area below z: Φ(z)=NORM.DIST(z,0,1,TRUE)\Phi(z) = \text{NORM.DIST}(z, 0, 1, TRUE) \,

    • Area above z: 1Φ(z)=1NORM.DIST(z,0,1,TRUE)1 - \Phi(z) = 1 - \text{NORM.DIST}(z, 0, 1, TRUE) \,

  • You can also use websites or other statistical software for the same results.

  • Important concept: If you know the area below z, you can get the area above by subtracting from 1 (since total area under the curve is 100%).

Standardization and cross-distribution comparisons

  • Z scores are called standardized scores because they convert raw scores to a common scale (in SD units).

  • This standardization allows direct comparisons across distributions with different means and spreads.

  • Example: comparing SAT vs ACT performance:

    • SAT: score 680, mean 500, SD 100 → z = \frac{680-500}{100} = 1.8

    • ACT: score 28, mean 18, SD 6 → z = \frac{28-18}{6} = 1.67

    • Since 1.8 > 1.67, the SAT score is further above its mean in SD units, indicating relatively better performance on the SAT than the ACT.

  • Another example from the transcript:

    • Drinking: mean 3 drinks/week, SD 2 → last week 6 drinks → z = \frac{6-3}{2} = 1.5

    • Lottery tickets: mean 0, SD 0.5 → last week 2 tickets → z = \frac{2-0}{0.5} = 4

    • The lottery behavior is far more extreme (z = 4) than the drinking (z = 1.5), illustrating cross-distribution comparison.

Summary of key takeaways

  • A z score tells you how many standard deviations a value is from the mean:

    • Positive z: above the mean; Negative z: below the mean.

  • Z scores are computed differently depending on whether you are describing a sample or a population:

    • Sample: z=xXsz = \frac{x - \overline{X}}{s}

    • Population: z=xμσz = \frac{x - \mu}{\sigma}

  • The inverse transformation to get a raw score from a z score is:

    • Population: x=zσ+μx = z\,\sigma + \mu

    • Sample: x=zs+Xx = z\,s + \overline{X}

  • Z scores enable comparisons across distributions and help interpret how unusual a value is within its distribution.

  • You can estimate percentile areas by hand using the standard normal areas; for precise values, use tools like Excel or online calculators.

  • Always consider the distribution shape; z-score interpretation assumes approximate normality for the referenced percentiles.

Real-world practice prompts (quick recap)

  • If a z score is -2.5, the value is two and a half standard deviations below the mean (far left tail).

  • If a z score is +1, about 84% of values fall below that z score (rough estimate: 50% below the mean + 34% from mean to +1).

  • If a z score is +0.6, about 26% of the distribution lies above that z score (illustrative estimate using area breakdowns).

  • For exact percentages, compute using the standard normal CDF: (\Phi(z)) or software; to get above, use (1-\Phi(z)).

Quick reference formulas

  • Z score (sample): z=xXsz = \frac{x - \overline{X}}{s}

  • Z score (population): z=xμσz = \frac{x - \mu}{\sigma}

  • Raw score from z (sample): x=zs+Xx = z\,s + \overline{X}

  • Raw score from z (population): x=zσ+μx = z\,\sigma + \mu

  • Area below z (standard normal): Φ(z)=NORM.DIST(z,0,1,TRUE)\Phi(z) = \text{NORM.DIST}(z, 0, 1, \text{TRUE})

  • Area above z: 1Φ(z)1 - \Phi(z)