Distribution & Standardization Cheatsheet

Empirical Rule

  • For a normal distribution: within 1 standard deviation (SD): ext{P}(|X-BC|\le \sigma) = 0.68

  • Within 2 SD: 0.950.95

  • Within 3 SD: 0.9970.997

  • Example (IQ): mean μ=100\mu=100, SD σ=11\sigma=11

    • 1 SD range: [89,111][89, 111]

    • 2 SD range: [78,122][78, 122]

    • 3 SD range: [67,133][67, 133]

    • Score between 67 and 133 corresponds to about 99.7%99.7\% of the population.

Z-scores and Standardization

  • Definition: z=xμσz = \frac{x-\mu}{\sigma}

  • Z-score units follow the standard normal: ZN(0,1)Z \sim N(0,1)

  • Interpretations:

    • Positive z: above mean; Negative z: below mean

    • Z-scores provide a standardized way to compare different distributions

  • Uses:

    • Compare an individual's position within its population

    • Compare across populations (e.g., different exams) by converting to z-scores

    • If exams are the same, you can compare raw scores directly; otherwise use z-scores

From x to z and back

  • Given mean and SD: μ,  σ\mu,\; \sigma

  • To convert: z=xμσz = \dfrac{x-\mu}{\sigma}

  • To recover x: x=μ+zσx = \mu + z\,\sigma

Percentiles and quartiles

  • Percentile: the value x such that P(Xx)=pP(X\le x) = p where p is the percentile expressed as a decimal (e.g., 0.90 for the 90th percentile)

  • Area to the left defines the percentile

  • Five-number summary (for raw data): min, Q1, median (Q2), Q3, max

  • Interquartile range: IQR=Q3Q1\text{IQR} = Q3 - Q1

  • Use quartiles when data may be skewed; quartiles are robust to outliers

Outliers and fences

  • Box/whisker interpretation: whiskers indicate spread; long whiskers suggest skewness

  • Outlier detection (IQR method):

    • Lower fence: LFence=Q11.5IQR\text{LFence} = Q1 - 1.5\cdot \text{IQR}

    • Upper fence: UFence=Q3+1.5IQR\text{UFence} = Q3 + 1.5\cdot \text{IQR}

  • Values beyond fences are potential outliers

  • Note: some contexts use alternative rules (e.g., 1.5 IQR or 3 IQR) depending on the textbook or software

When to use mean vs median; role of skewness

  • If distribution is skewed or has outliers: use median and IQR (robust)

  • If roughly symmetric and no major outliers: mean and SD are informative

Practical problem-solving approach

  • Step 1: Write down relevant information (mean, SD, sample size, x values)

  • Step 2: Decide whether to compare using x (raw) or z (standardized)

  • Step 3: If asked for a z-score: use z=xμσz = \frac{x-\mu}{\sigma}

  • Step 4: If asked for x from z: use x=μ+zσx = \mu + z\sigma

  • Step 5: For comparing across populations, use z-scores; for same exam, raw x can be compared

Quick notes on interpretation and examples

  • Normal assumption matters: many methods assume normality; if X is not normal, consider medians/percentiles or transform data

  • Percentiles give position relative to the population: e.g., 5th percentile means 5% are at or below that value; 95th percentile means 95% are at or below that value

  • Skewness and outliers affect which measures you report (box plots help visualize)

Five-number summary in practice

  • Minimum, Q1, Median, Q3, Maximum

  • Box plot interpretation: roughly symmetrical if whiskers are similar length; skew to the right if right whisker longer

  • Example workflow for a problem: compute five-number summary from data (min, Q1, median, Q3, max) and check IQR for potential outliers

Quick reference formulas

  • Z-score: z=xμσz = \dfrac{x-\mu}{\sigma}

  • From Z to x: x=μ+zσx = \mu + z\sigma

  • Standard normal: ZN(0,1)Z \sim N(0,1)

  • Within 1/2/3 SD ranges correspond to 68% / 95% / 99.7%

  • IQR: IQR=Q3Q1\text{IQR} = Q3 - Q1

  • Outlier fences: LFence=Q11.5IQR,UFence=Q3+1.5IQR\text{LFence} = Q1 - 1.5\cdot \text{IQR}, \quad \text{UFence} = Q3 + 1.5\cdot \text{IQR}