Last-minute Notes: Variance, Empirical Rule, Chebyshev

Population vs. Sample

Variance comes in two forms: population variance and sample variance. Population uses Greek letters; the mean is μ\mu. Sample uses Latin letters; the mean is xˉ\bar{x}. Parameters describe populations; statistics describe samples.

Parameters vs. Statistics

Population descriptors are called parameters (e.g., population mean μ\mu, population variance σ2\sigma^2, population size NN). Sample descriptors are statistics (e.g., sample mean xˉ\bar{x}, sample variance s2s^2, sample size nn).

Variance and Standard Deviation

Population variance: σ2=1N<em>i=1N(x</em>iμ)2\sigma^2 = \frac{1}{N} \sum<em>{i=1}^{N} (x</em>i - \mu)^2
Sample variance: s2=1n1<em>i=1n(x</em>ixˉ)2s^2 = \frac{1}{n-1} \sum<em>{i=1}^{n} (x</em>i - \bar{x})^2
Population size is NN; sample size is nn. In many calculators, the symbol nn is used for both, but conceptually they differ.

Standard deviation: σ=σ2\sigma = \sqrt{\sigma^2} and s=s2s = \sqrt{s^2}.

Manual Computation Steps

1) Compute the mean (population: μ\mu, sample: xˉ\bar{x}). 2) Compute deviations ximeanx_i - \text{mean}. 3) Square deviations. 4) Sum the squared deviations. 5) Divide by NN (population) or by n1n-1 (sample). The only difference is the final division.

Calculator and Relationships

If you are given the standard deviation, square it to obtain the variance. If you are given the variance, take the square root to obtain the standard deviation. Calculators often present the standard deviation directly.

Empirical Rule (Normal/Bell-shaped Data)

For approximately normal data with mean μ\mu and standard deviation σ\sigma:

  • About 68% lie within [μσ,μ+σ][\mu-\sigma, \mu+\sigma].

  • About 95% lie within [μ2σ,μ+2σ][\mu-2\sigma, \mu+2\sigma].

  • About 99.7% lie within [μ3σ,μ+3σ][\mu-3\sigma, \mu+3\sigma].

Example: Suppose μ=100\mu=100 and σ=15\sigma=15. Then:

  • 68% lie between [85,115][85, 115].

  • 95% lie between [70,130][70, 130].

  • 99.7% lie between [55,145][55, 145].

For interpretation, phrase it as: approximately 68% of the data lie between the two bounds, and similarly for 95% and 99.7%.

Note on tails: because the rule is based on symmetry, areas in the two tails are equal for the same deviation from the mean.

Chebyshev's Inequality (Non-normal Data)

If data are not assumed to be normal, use Chebyshev's inequality. For any distribution and any k > 1,
P(Xμkσ)11k2P(|X-\mu| \le k\sigma) \ge 1 - \frac{1}{k^2}
Equivalently, at least 11k21 - \frac{1}{k^2} of the data lie within kk standard deviations of the mean. This gives a guaranteed, though conservative, percent.

Quick Takeaways

  • Distinguish between population (parameters) and sample (statistics).

  • Use the formulas with the appropriate denominator: σ2=1N(x<em>iμ)2\sigma^2 = \frac{1}{N}\sum (x<em>i-\mu)^2 vs s2=1n1(x</em>ixˉ)2s^2 = \frac{1}{n-1}\sum (x</em>i-\bar{x})^2.

  • The empirical rule applies to normal data; Chebyshev applies to any data.

  • Practice: compute mean, deviations, squares, sums, and the final division to obtain variance; square root for SD.