Measures of Central Tendency and Variation

  • Understanding the Mean

    • The mean, also known as the average, is a common measure of central tendency.
    • To calculate the mean:
    • Add all values together.
    • Divide the total by the number of observations (sample size, denoted as ( n )).
    • Notation:
    • Sample mean is represented as ( \bar{x} ).
    • All values contribute equally to the mean.
  • Example of Mean Calculation

    • For values: 11, 12, 13, 14, and 15:
    • Mean = ( \frac{11+12+13+14+15}{5} = \frac{65}{5} = 13 )
    • The mean represents a balanced measure where two values are below and two above it.
    • Changing a value from 15 to 20 alters the mean to 14, resulting in an imbalance (3 values below, 1 above).
  • Implications of Outliers

    • The mean is sensitive to extreme values or outliers, affecting its accuracy as a measure.
    • Example: If the mean is skewed by a high outlier, it may misrepresent the data's center.
  • Using Excel to Calculate Mean

    • Using Excel, one can compute means easily.
    • Example values for preparation time: 39 minutes on first day -> average time calculated via ( =AVERAGE(range) ) gives 39.6 minutes.
  • Introducing the Median

    • The median, the middle value in an ordered dataset, is unaffected by extreme values.
    • Example: For data set {11, 12, 13, 14, 15}, the median is 13.
    • For even-numbered datasets: average the two middle values to find the median.
  • Median Calculation

    • To find median position: ( \frac{(n+1)}{2} ).
    • For odd numbers, it yields a whole number; for even, it yields a .5 position indicating an average of two middle values.
    • Excel function for median: ( =MEDIAN(range) ).
  • Understanding Mode

    • The mode is the most frequently occurring value in a dataset.
    • It can apply to both numerical and categorical data.
    • Example: In {39, 39, 44, 44, 9}, both 39 and 44 are modes (bimodal).
  • Choosing the Right Measure

    • Choice between mean, median, or mode depends on data characteristics.
    • Typically report both mean and median due to their differing sensitivities to outliers.
  • Geometric Mean for Rate of Change

    • For percentage or rate changes, use geometric mean:
    • It multiplies rates then takes the root: ( (1+r1)(1+r2)…(1+r_n)^{1/n} - 1 )
    • Example of using geometric mean for investment changes emphasizes its value over the arithmetic mean for such contexts.
  • Measures of Variation

    • Relying only on range (max - min) isn’t advisable due to its sensitivity to outliers.
    • Use variance and sample standard deviation for a comprehensive overview of data spread.
    • Variance Calculation:
    • Steps:
      1. Compute the mean.
      2. Subtract mean from each observation, square the result.
      3. Calculate the average of squared differences.
      4. Adjust for sample size: divide by ( n-1 ).
    • Standard Deviation:
    • The square root of variance, presenting data spread in the same units as the data itself.
  • Sensitivity of Variance and Standard Deviation

    • Variance is always non-negative (zero or positive).
    • Standard deviation communicates dispersion more effectively due to its non-squared units
  • Coefficient of Variation

    • Daily comparison of variability across datasets via: ( CV = \frac{Standard Deviation}{Mean} \times 100 \% ).
    • Useful for assessing relative risk in investment or variability in datasets.
  • Conclusion

    • All measures (mean, median, mode, variance, and standard deviation) serve unique roles based on data characteristics and analytical goals.
    • When extreme values exist, prefer median and variance/standard deviation for a more accurate depiction of the data.