Measures of Dispersion

Lesson Overview

  • Focus on measures of dispersion

  • Goals for the lesson:

    • Compute range, variance, and standard deviation

    • Understand standard deviation and variance

    • Calculate coefficient of variation and compare variations

    • Use the empirical rule and Chebyshev's theorem to describe data

    • Calculate variance and standard deviation of grouped data

Importance of Measures of Dispersion

  • Measures of center determine where data is centered on a number line.

  • Variability provides insight into the shape of the distribution.

    • Example: Average wait time at a doctor’s office is 20 minutes.

    • Key question: Is the wait time consistent for all patients?

Range

  • Definition: Difference between the largest and smallest values in a dataset.

  • Formula:

    • Range = Maximum Data Value - Minimum Data Value.

  • Limitation: Not as descriptive as variance and standard deviation; does not reflect how data is spread around the mean.

Variance

  • Definition: Measure of how far data values are spread from the mean; a squared measurement.

  • Population Variance Formula:

    • ( \sigma^2 = \frac{\Sigma (X_i - \mu)^2}{N} )

      • Where:

        • (X_i) = ith value in the population

        • (\mu) = population mean

        • N = number of values in the population

  • Sample Variance Formula:

    • ( s^2 = \frac{\Sigma (X_i - \bar{x})^2}{n - 1} )

      • Where:

        • (X_i) = ith value in the sample

        • (\bar{x}) = sample mean

        • n = number of data values in the sample

  • Rounding Rule: Round to 1 more decimal place than the largest number of decimal places in the data.

  • Variance units are squared, which can make interpretation less straightforward.

Standard Deviation

  • Definition: Measure of expected deviation from the mean; provides scale for variation.

  • Population Standard Deviation Formula:

    • ( \sigma = \sqrt{\frac{\Sigma (X_i - \mu)^2}{N}} )

  • Sample Standard Deviation Formula:

    • ( s = \sqrt{\frac{\Sigma (X_i - \bar{x})^2}{n - 1}} )

  • Rounding: Same rule as variance.

  • Difference in formulas arises due to the need for correction in sample standard deviation (biased estimator).

Coefficient of Variation (CV)

  • Definition: Ratio of standard deviation to mean expressed as a percentage.

  • Population Formula:

    • ( CV = \frac{\sigma}{\mu} \times 100% )

  • Sample Formula:

    • ( CV = \frac{s}{\bar{x}} \times 100% )

  • Purpose: Allows comparison of spreads between different datasets.

Empirical Rule

  • Applicable to bell-shaped distributions:

    • Approximately 68% of data within 1 standard deviation from the mean.

    • Approximately 95% of data within 2 standard deviations from the mean.

    • Approximately 99.7% of data within 3 standard deviations from the mean.

Chebyshev’s Theorem

  • Useful for all distributions, not just bell-shaped.

  • Provides a minimum estimate of data within k standard deviations of the mean:

    • Formula: Proportion = 1 - (1/k^2) for k > 1.

    • For k = 2: At least 75% of data within 2 standard deviations.

    • For k = 3: At least 88.9% of data within 3 standard deviations.

Variance and Standard Deviation of Grouped Data

  • When given a frequency distribution without original data:

    • Use class midpoints as representative values.

  • Formula for Standard Deviation of Grouped Data:

    • ( s = \sqrt{\frac{n \sum f_i x_i^2 - (\sum f_i x_i)^2}{n(n - 1)}} )

      • Where:

        • n = sample size

        • f_i = frequency of class i

        • x_i = midpoint of class i.

  • Estimate variance using relationship between variance and standard deviation.

Conclusion

  • Understanding measures of dispersion helps in accurately describing data distributions.

robot