Distribution & Standardization Cheatsheet
Empirical Rule
For a normal distribution: within 1 standard deviation (SD): ext{P}(|X-BC|\le \sigma) = 0.68
Within 2 SD:
Within 3 SD:
Example (IQ): mean , SD
1 SD range:
2 SD range:
3 SD range:
Score between 67 and 133 corresponds to about of the population.
Z-scores and Standardization
Definition:
Z-score units follow the standard normal:
Interpretations:
Positive z: above mean; Negative z: below mean
Z-scores provide a standardized way to compare different distributions
Uses:
Compare an individual's position within its population
Compare across populations (e.g., different exams) by converting to z-scores
If exams are the same, you can compare raw scores directly; otherwise use z-scores
From x to z and back
Given mean and SD:
To convert:
To recover x:
Percentiles and quartiles
Percentile: the value x such that where p is the percentile expressed as a decimal (e.g., 0.90 for the 90th percentile)
Area to the left defines the percentile
Five-number summary (for raw data): min, Q1, median (Q2), Q3, max
Interquartile range:
Use quartiles when data may be skewed; quartiles are robust to outliers
Outliers and fences
Box/whisker interpretation: whiskers indicate spread; long whiskers suggest skewness
Outlier detection (IQR method):
Lower fence:
Upper fence:
Values beyond fences are potential outliers
Note: some contexts use alternative rules (e.g., 1.5 IQR or 3 IQR) depending on the textbook or software
When to use mean vs median; role of skewness
If distribution is skewed or has outliers: use median and IQR (robust)
If roughly symmetric and no major outliers: mean and SD are informative
Practical problem-solving approach
Step 1: Write down relevant information (mean, SD, sample size, x values)
Step 2: Decide whether to compare using x (raw) or z (standardized)
Step 3: If asked for a z-score: use
Step 4: If asked for x from z: use
Step 5: For comparing across populations, use z-scores; for same exam, raw x can be compared
Quick notes on interpretation and examples
Normal assumption matters: many methods assume normality; if X is not normal, consider medians/percentiles or transform data
Percentiles give position relative to the population: e.g., 5th percentile means 5% are at or below that value; 95th percentile means 95% are at or below that value
Skewness and outliers affect which measures you report (box plots help visualize)
Five-number summary in practice
Minimum, Q1, Median, Q3, Maximum
Box plot interpretation: roughly symmetrical if whiskers are similar length; skew to the right if right whisker longer
Example workflow for a problem: compute five-number summary from data (min, Q1, median, Q3, max) and check IQR for potential outliers
Quick reference formulas
Z-score:
From Z to x:
Standard normal:
Within 1/2/3 SD ranges correspond to 68% / 95% / 99.7%
IQR:
Outlier fences: