Last-minute Notes: Variance, Empirical Rule, Chebyshev

Population vs. Sample

Variance comes in two forms: population variance and sample variance. Population uses Greek letters; the mean is $\mu$ . Sample uses Latin letters; the mean is $\bar{x}$ . Parameters describe populations; statistics describe samples.

Parameters vs. Statistics

Population descriptors are called parameters (e.g., population mean $\mu$ , population variance $\sigma^2$ , population size $N$ ). Sample descriptors are statistics (e.g., sample mean $\bar{x}$ , sample variance $s^2$ , sample size $n$ ).

Variance and Standard Deviation

Population variance: $\sigma^2 = \frac{1}{N} \sum{i=1}^{N} (xi - \mu)^2$
Sample variance: $s^2 = \frac{1}{n-1} \sum{i=1}^{n} (xi - \bar{x})^2$
Population size is $N$ ; sample size is $n$ . In many calculators, the symbol $n$ is used for both, but conceptually they differ.

Standard deviation: $\sigma = \sqrt{\sigma^2}$ and $s = \sqrt{s^2}$ .

Manual Computation Steps

1) Compute the mean (population: $\mu$ , sample: $\bar{x}$ ). 2) Compute deviations $x_i - \text{mean}$ . 3) Square deviations. 4) Sum the squared deviations. 5) Divide by $N$ (population) or by $n-1$ (sample). The only difference is the final division.

Calculator and Relationships

If you are given the standard deviation, square it to obtain the variance. If you are given the variance, take the square root to obtain the standard deviation. Calculators often present the standard deviation directly.

Empirical Rule (Normal/Bell-shaped Data)

For approximately normal data with mean $\mu$ and standard deviation $\sigma$ :

About 68% lie within $[\mu-\sigma, \mu+\sigma]$ .
About 95% lie within $[\mu-2\sigma, \mu+2\sigma]$ .
About 99.7% lie within $[\mu-3\sigma, \mu+3\sigma]$ .

Example: Suppose $\mu=100$ and $\sigma=15$ . Then:

68% lie between $[85, 115]$ .
95% lie between $[70, 130]$ .
99.7% lie between $[55, 145]$ .

For interpretation, phrase it as: approximately 68% of the data lie between the two bounds, and similarly for 95% and 99.7%.

Note on tails: because the rule is based on symmetry, areas in the two tails are equal for the same deviation from the mean.

Chebyshev's Inequality (Non-normal Data)

If data are not assumed to be normal, use Chebyshev's inequality. For any distribution and any k > 1,
$P(|X-\mu| \le k\sigma) \ge 1 - \frac{1}{k^2}$
Equivalently, at least $1 - \frac{1}{k^2}$ of the data lie within $k$ standard deviations of the mean. This gives a guaranteed, though conservative, percent.

Quick Takeaways

Distinguish between population (parameters) and sample (statistics).
Use the formulas with the appropriate denominator: $\sigma^2 = \frac{1}{N}\sum (xi-\mu)^2$ vs $s^2 = \frac{1}{n-1}\sum (xi-\bar{x})^2$ .
The empirical rule applies to normal data; Chebyshev applies to any data.
Practice: compute mean, deviations, squares, sums, and the final division to obtain variance; square root for SD.