Measures of Variation and Outlier Analysis Study Notes
Introduction to Measures of Variation
Measures of variation are values that describe the variability, or spread, of a data set. These measures aim to describe how the values within a data set vary from one another using a single numerical value.
Range: This is the most basic measure of variation. It is calculated as the difference between the greatest and least data values in a set. * Formula: * Example: Consider the data set . The data values range from to . Therefore, the range is .
Quartiles: Just as the median divides a data set into two halves, quartiles divide the data into fourths. Each fourth represents of the data. * First Quartile (): The median of the lower half of the data. * Second Quartile (): The median of the entire data set. * Third Quartile (): The median of the upper half of the data.
Interquartile Range (IQR): The distance between the first and third quartiles of the data set. Subtract the first quartile from the third quartile to find the value. * Formula: * Significance: The IQR represents the middle half, or middle , of the data. A lower IQR indicates that the middle half of the data is closer to the median. * Calculation Example: In the data set provided above, and . The .
Mean Absolute Deviation (MAD)
The Mean Absolute Deviation (MAD) is a measure of variation that describes the average distance between each data value and the mean of the data set.
General Interpretation
- The MAD represents the average distance between each data value and the mean.
- A smaller MAD indicates that the data values are, on average, closer to the mean, reflecting lower variability.
Calculation Examples
Sunny Days in U.S. Cities: * Data: * Mean Calculation: * Distances from Mean: * * * * * * * * * MAD Calculation:
Number of Flowers Sold: * Data: * Mean Calculation: * MAD Calculation: The sum of absolute differences is .
Baseball Team Comparison (Bears vs. Saints): * Bears Wins: . Mean = . MAD = . * Saints Wins: . Mean = . MAD = . * Comparison: The data values for the Saints are closer to their mean because the Saints have a lower MAD compared to the Bears.
Canned Goods Collection (Room 101 vs. Room 102): * Room 101: Data: . Mean = . MAD = . * Room 102: Data: . Mean = . MAD = . * Comparison: The data values for Room 101 are significantly closer to the mean than those of Room 102.
Calories per Serving: * Data: * Mean Calculation: * MAD Calculation:
Identifying Outliers
Outliers are data values that are significantly lower or higher than the rest of the data. They are identified using calculated thresholds known as the Lower Limit and Upper Limit.
Formulas for Outlier Limits
- Interquartile Range:
- Lower Limit:
- Upper Limit:
Outlier Case Studies
Joakim's Piano Practice: * Data: . * , , . * . * Upper Limit: . * Conclusion: is an outlier because it exceeds .
Basketball Team Scores: * Data: . * , , . * . * Lower Limit: . * Conclusion: is an outlier because it is less than .
Abrianna's Cookie Boxes: * Data set: * , , . * . * Lower Limit: . Upper Limit: . * Conclusion: Both and are outliers.
Pet Store Customers: * Data set: * , , . * . * Lower Limit: . Upper Limit: . * Conclusion: There are no outliers in this data set.
Impact of Outliers on Mean and Median
Outliers affect the mean more significantly than the median. When outliers are present, the median is often the better measure to describe the center of the data.
Example: Tree Prices () * Outlier: . * With outlier: Mean = , Median = . * Without outlier: Mean = , Median = . * Observation: The median changed very little. The median best describes the center.
Example: Backpack Prices () * Outlier: . * With outlier: Mean = , Median = . * Without outlier: Mean = , Median = . * Observation: The median best describes the center.
Example: Football Points () * Outlier: . * With outlier: Mean = , Median = . * Without outlier: Mean = , Median = . * Observation: The median best describes the center.
Interpreting and Constructing Box Plots
Box plots provide a visual summary of a data set based on a five-number summary: Minimum, First Quartile (), Median (), Third Quartile (), and Maximum.
Annual Snowfall (inches)
- Minimum:
- First Quartile ():
- Median ():
- Third Quartile ():
- Maximum:
- Range:
- IQR:
Average Gas Mileage (mpg)
- Data Range:
- Median:
- First Quartile ():
- Third Quartile ():
- IQR:
Practice: Apps and Animal Sleep Patterns
Apps Used: Data () * Range: * IQR: * Description: The whole data set varies by a range of , while the middle half varies by only .
Animal Sleep Time (h): Data () * Range: * IQR:
Cost of Tents: Data () * Range: * IQR: * Description: The data vary by a range of . The middle half of the data varies by .