Z scores and standardization — study notes

What is a z score?

A z score provides a concise way to describe exactly where an individual score falls within its distribution.
It represents the number of standard deviations a score is above or below the mean of that distribution.
A z score is a standardized score: it converts the original unit of measurement (e.g., exam points, temperature) into units of standard deviations.
Intuition: knowing a score (x) and the mean and spread (SD) of the distribution lets you say how unusual or typical that score is.
If distributions are roughly normal, z scores help compare scores across different distributions.

Why we care about z scores

With just a raw score, you often don’t know how extreme it is without the distribution’s mean and spread.
Two students could have the same raw score (e.g., 78) in different distributions with the same mean but different spreads; z scores reveal relative standing.
A small standard deviation means scores are tightly clustered around the mean, so a given raw score is more extreme; a large SD means scores are more spread out, so the same raw score is less extreme.
Z scores summarize three pieces of information in one number: location (relative to mean) and spread (in SD units).

Notation: samples vs populations

Statisticians distinguish between describing a sample (descriptive statistics) and describing a population (inferential statistics).
Sample (descriptive): represented with English/ Roman letters (e.g., X, \bar{X}, s) – describe the data you actually collected.
Population (inferential): represented with Greek letters (e.g., \mu, \sigma, \sigma^2) – describe the full population when known.
A population parameter (e.g., population mean \mu, population SD \sigma) is often unknown and estimated from sample data.
Common sample statistics:
- Mean: (\overline{X}) (often written as M in some contexts)
- Variance: (s^2)
- Standard deviation: (s)
Common population parameters:
- Mean: (\mu) (mu pronounced like "mew")
- Variance: (\sigma^2)
- Standard deviation: (\sigma)
Pronunciation note: (\mu) is pronounced mu, not like the cow sound "moo".
One-to-one mapping: mean ↔ (\mu), variance ↔ (\sigma^2), standard deviation ↔ (\sigma).

Formulae: how to compute z scores

For a score within a sample (descriptive z score):
- $z = \frac{x - \overline{X}}{s}$
For a score within a population (inferential z score):
- $z = \frac{x - \mu}{\sigma}$
Converting back from a z score to a raw score:
- Within a population: $x = z\,\sigma + \mu$
- Within a sample: $x = z\,s + \overline{X}$
Quick example (population): mean 100, SD 10, raw score 120:
- $z = \frac{120 - 100}{10} = 2$
Another example (population): mean 100, SD 10, raw score 85:
- $z = \frac{85 - 100}{10} = -1.5$
Another example (sample): mean 4 cups/day, SD 1.5, raw score 6 cups:
- $z = \frac{6 - 4}{1.5} \approx 1.33$
Inverse example (to raw from z): z = -2, mean 4, SD 1.5:
- $x = (-2)(1.5) + 4 = 1$

Worked practice problems from the transcript

Practice 1 (population parameters):
- Population mean (\mu = 100), population SD (\sigma = 10), raw score (x = 120):
- $z = \frac{120 - 100}{10} = 2$
Practice 2 (population):
- Mean (\mu = 100), SD (\sigma = 10), raw score (x = 85):
- $z = \frac{85 - 100}{10} = -1.5$
Practice 3 (coffee):
- Mean daily cups = 4, SD = 1.5, raw score = 6:
- $z = \frac{6 - 4}{1.5} = \frac{2}{1.5} \approx 1.33$
Practice 4 (inverse):
- If z = -2, mean = 4, SD = 1.5:
- $x = (-2)(1.5) + 4 = 1$
Conceptual practice: interpreting a z score rather than calculating a number
- A z score of -2.5 (e.g., rainfall) means the value lies 2.5 SDs below the mean, i.e., in the far left tail.
- In a roughly symmetric normal distribution, moving out to -2.5 SD places you in the far tail, indicating well-below-average value.

Unit conversion intuition: z scores as unit conversions

Conceptual idea: converting original units to standard deviation units is just unit conversion.
Examples from the transcript (illustrative):
- 27 feet ≈ 9 yards (unit conversion)
- 72°F ≈ 22.22°C (unit conversion; note Fahrenheit to Celsius conversion)
- 350 liters ≈ 1.43 hogsheads (hogshead is a unit of beer volume)
- 128 cubic feet of firewood = 1 cord (or also 3 ricks of firewood)
Takeaway: z scores are simply converting the measurement to the number of standard deviations away from the mean.

Area under the normal curve and z-score interpretation

Core idea: in a standard normal curve (mean 0, SD 1), you can estimate percentages of scores between or beyond z scores using known segments:
- From mean to +1 SD: ~34%
- From +1 SD to +2 SD: ~14%
- Above +2 SD: ~2%
- By symmetry, below -1 SD: ~34%; between -1 and -2: ~14%; below -2: ~2%
Estimating percent below a positive z (e.g., z = +1):
- Draw the standard normal curve, mark z = +1, shade below it.
- Area from 0 to +1 is ~34%; area below mean is 50%; total below +1 is ~84%
Estimating percent above a negative z (e.g., z = -1.5):
- Shade above z = -1.5; split into regions: above 0 is 50%, 0 to -1 is 34%, left of -1 to -2 is 14%, left of -2 is 2% (estimate the small left tail between -1 and -2 as ~8% in the transcript, noting estimation).
- Sum: 50% + 34% + ~8% ≈ 92%
Estimating percent below a negative z (e.g., z = -1.8):
- Area below -1.8 ≈ 2% (below -2) + a small piece between -2 and -1.8 (estimated ~1–2%) ≈ 3–4%
Estimating percent above a small positive z (e.g., z = +0.6):
- Area above +0.6 ≈ small tail above 0.6, combine known segments: between 0 and +1 is 34%, above 2 is 2%, between 0 and +0.6 is part of the 0 to +1 segment; the transcript estimates about 26% above +0.6 using a breakdown of 10% (between 0.6 and 1) + 14% (between 1 and 2) + 2% (above 2).
Practical note: these are estimates to emphasize understanding; exact values can be obtained with tools.

Exact percentages with calculators and software

Excel and other tools can compute exact percentages below/above a given z score.
In Excel (standard normal):
- Area below z: $\Phi(z) = \text{NORM.DIST}(z, 0, 1, TRUE) \,$
- Area above z: $1 - \Phi(z) = 1 - \text{NORM.DIST}(z, 0, 1, TRUE) \,$
You can also use websites or other statistical software for the same results.
Important concept: If you know the area below z, you can get the area above by subtracting from 1 (since total area under the curve is 100%).

Standardization and cross-distribution comparisons

Z scores are called standardized scores because they convert raw scores to a common scale (in SD units).
This standardization allows direct comparisons across distributions with different means and spreads.
Example: comparing SAT vs ACT performance:
- SAT: score 680, mean 500, SD 100 → z = \frac{680-500}{100} = 1.8
- ACT: score 28, mean 18, SD 6 → z = \frac{28-18}{6} = 1.67
- Since 1.8 > 1.67, the SAT score is further above its mean in SD units, indicating relatively better performance on the SAT than the ACT.
Another example from the transcript:
- Drinking: mean 3 drinks/week, SD 2 → last week 6 drinks → z = \frac{6-3}{2} = 1.5
- Lottery tickets: mean 0, SD 0.5 → last week 2 tickets → z = \frac{2-0}{0.5} = 4
- The lottery behavior is far more extreme (z = 4) than the drinking (z = 1.5), illustrating cross-distribution comparison.

Summary of key takeaways

A z score tells you how many standard deviations a value is from the mean:
- Positive z: above the mean; Negative z: below the mean.
Z scores are computed differently depending on whether you are describing a sample or a population:
- Sample: $z = \frac{x - \overline{X}}{s}$
- Population: $z = \frac{x - \mu}{\sigma}$
The inverse transformation to get a raw score from a z score is:
- Population: $x = z\,\sigma + \mu$
- Sample: $x = z\,s + \overline{X}$
Z scores enable comparisons across distributions and help interpret how unusual a value is within its distribution.
You can estimate percentile areas by hand using the standard normal areas; for precise values, use tools like Excel or online calculators.
Always consider the distribution shape; z-score interpretation assumes approximate normality for the referenced percentiles.

Real-world practice prompts (quick recap)

If a z score is -2.5, the value is two and a half standard deviations below the mean (far left tail).
If a z score is +1, about 84% of values fall below that z score (rough estimate: 50% below the mean + 34% from mean to +1).
If a z score is +0.6, about 26% of the distribution lies above that z score (illustrative estimate using area breakdowns).
For exact percentages, compute using the standard normal CDF: (\Phi(z)) or software; to get above, use (1-\Phi(z)).

Quick reference formulas

Z score (sample): $z = \frac{x - \overline{X}}{s}$
Z score (population): $z = \frac{x - \mu}{\sigma}$
Raw score from z (sample): $x = z\,s + \overline{X}$
Raw score from z (population): $x = z\,\sigma + \mu$
Area below z (standard normal): $\Phi(z) = \text{NORM.DIST}(z, 0, 1, \text{TRUE})$
Area above z: $1 - \Phi(z)$