Focus on measures of dispersion
Goals for the lesson:
Compute range, variance, and standard deviation
Understand standard deviation and variance
Calculate coefficient of variation and compare variations
Use the empirical rule and Chebyshev's theorem to describe data
Calculate variance and standard deviation of grouped data
Measures of center determine where data is centered on a number line.
Variability provides insight into the shape of the distribution.
Example: Average wait time at a doctor’s office is 20 minutes.
Key question: Is the wait time consistent for all patients?
Definition: Difference between the largest and smallest values in a dataset.
Formula:
Range = Maximum Data Value - Minimum Data Value.
Limitation: Not as descriptive as variance and standard deviation; does not reflect how data is spread around the mean.
Definition: Measure of how far data values are spread from the mean; a squared measurement.
Population Variance Formula:
( \sigma^2 = \frac{\Sigma (X_i - \mu)^2}{N} )
Where:
(X_i) = ith value in the population
(\mu) = population mean
N = number of values in the population
Sample Variance Formula:
( s^2 = \frac{\Sigma (X_i - \bar{x})^2}{n - 1} )
Where:
(X_i) = ith value in the sample
(\bar{x}) = sample mean
n = number of data values in the sample
Rounding Rule: Round to 1 more decimal place than the largest number of decimal places in the data.
Variance units are squared, which can make interpretation less straightforward.
Definition: Measure of expected deviation from the mean; provides scale for variation.
Population Standard Deviation Formula:
( \sigma = \sqrt{\frac{\Sigma (X_i - \mu)^2}{N}} )
Sample Standard Deviation Formula:
( s = \sqrt{\frac{\Sigma (X_i - \bar{x})^2}{n - 1}} )
Rounding: Same rule as variance.
Difference in formulas arises due to the need for correction in sample standard deviation (biased estimator).
Definition: Ratio of standard deviation to mean expressed as a percentage.
Population Formula:
( CV = \frac{\sigma}{\mu} \times 100% )
Sample Formula:
( CV = \frac{s}{\bar{x}} \times 100% )
Purpose: Allows comparison of spreads between different datasets.
Applicable to bell-shaped distributions:
Approximately 68% of data within 1 standard deviation from the mean.
Approximately 95% of data within 2 standard deviations from the mean.
Approximately 99.7% of data within 3 standard deviations from the mean.
Useful for all distributions, not just bell-shaped.
Provides a minimum estimate of data within k standard deviations of the mean:
Formula: Proportion = 1 - (1/k^2) for k > 1.
For k = 2: At least 75% of data within 2 standard deviations.
For k = 3: At least 88.9% of data within 3 standard deviations.
When given a frequency distribution without original data:
Use class midpoints as representative values.
Formula for Standard Deviation of Grouped Data:
( s = \sqrt{\frac{n \sum f_i x_i^2 - (\sum f_i x_i)^2}{n(n - 1)}} )
Where:
n = sample size
f_i = frequency of class i
x_i = midpoint of class i.
Estimate variance using relationship between variance and standard deviation.
Understanding measures of dispersion helps in accurately describing data distributions.