Measures of Central Tendency and Variation
Understanding the Mean
- The mean, also known as the average, is a common measure of central tendency.
- To calculate the mean:
- Add all values together.
- Divide the total by the number of observations (sample size, denoted as ( n )).
- Notation:
- Sample mean is represented as ( \bar{x} ).
- All values contribute equally to the mean.
Example of Mean Calculation
- For values: 11, 12, 13, 14, and 15:
- Mean = ( \frac{11+12+13+14+15}{5} = \frac{65}{5} = 13 )
- The mean represents a balanced measure where two values are below and two above it.
- Changing a value from 15 to 20 alters the mean to 14, resulting in an imbalance (3 values below, 1 above).
Implications of Outliers
- The mean is sensitive to extreme values or outliers, affecting its accuracy as a measure.
- Example: If the mean is skewed by a high outlier, it may misrepresent the data's center.
Using Excel to Calculate Mean
- Using Excel, one can compute means easily.
- Example values for preparation time: 39 minutes on first day -> average time calculated via ( =AVERAGE(range) ) gives 39.6 minutes.
Introducing the Median
- The median, the middle value in an ordered dataset, is unaffected by extreme values.
- Example: For data set {11, 12, 13, 14, 15}, the median is 13.
- For even-numbered datasets: average the two middle values to find the median.
Median Calculation
- To find median position: ( \frac{(n+1)}{2} ).
- For odd numbers, it yields a whole number; for even, it yields a .5 position indicating an average of two middle values.
- Excel function for median: ( =MEDIAN(range) ).
Understanding Mode
- The mode is the most frequently occurring value in a dataset.
- It can apply to both numerical and categorical data.
- Example: In {39, 39, 44, 44, 9}, both 39 and 44 are modes (bimodal).
Choosing the Right Measure
- Choice between mean, median, or mode depends on data characteristics.
- Typically report both mean and median due to their differing sensitivities to outliers.
Geometric Mean for Rate of Change
- For percentage or rate changes, use geometric mean:
- It multiplies rates then takes the root: ( (1+r1)(1+r2)…(1+r_n)^{1/n} - 1 )
- Example of using geometric mean for investment changes emphasizes its value over the arithmetic mean for such contexts.
Measures of Variation
- Relying only on range (max - min) isn’t advisable due to its sensitivity to outliers.
- Use variance and sample standard deviation for a comprehensive overview of data spread.
- Variance Calculation:
- Steps:
- Compute the mean.
- Subtract mean from each observation, square the result.
- Calculate the average of squared differences.
- Adjust for sample size: divide by ( n-1 ).
- Standard Deviation:
- The square root of variance, presenting data spread in the same units as the data itself.
Sensitivity of Variance and Standard Deviation
- Variance is always non-negative (zero or positive).
- Standard deviation communicates dispersion more effectively due to its non-squared units
Coefficient of Variation
- Daily comparison of variability across datasets via: ( CV = \frac{Standard Deviation}{Mean} \times 100 \% ).
- Useful for assessing relative risk in investment or variability in datasets.
Conclusion
- All measures (mean, median, mode, variance, and standard deviation) serve unique roles based on data characteristics and analytical goals.
- When extreme values exist, prefer median and variance/standard deviation for a more accurate depiction of the data.