Statistics: Range, IQR, Outliers, and Empirical Rule
Range
- The range of a data set is the difference between the maximum and minimum values.
- Example:
- Given data points: 1 and 12.
- Range calculation:
- Maximum = 12
- Minimum = 1
- Range = Maximum - Minimum = 12 - 1 = 11.
- The calculation can be "gut-checked" by adding the minimum to the range to see if it equals the maximum.
Interquartile Range (IQR)
- Not covering the calculation at this moment but introducing the concept visually using a normal curve.
- In a normal distribution, the mean is represented at the center, typically at 0.
- Range of scores is indicated on either side of the mean.
Empirical Rule
- The Empirical Rule describes how data is distributed in a normal distribution:
- 68% of the data falls within one standard deviation of the mean.
- 95% of the data falls within two standard deviations of the mean.
- 99.7% of the data falls within three standard deviations of the mean.
- Illustratively:
- One standard deviation: from -1 to +1
- Two standard deviations: from -2 to +2
- Three standard deviations: from -3 to +3
Analysis of Outliers
- Discussion of extreme data points:
- If a data point is very far from the mean (e.g., five standard deviations), it may be considered an outlier.
- Example: If everyone in a class averages an 85, but one person scores a 15, this score may be categorized as an outlier.
- The underlying question is whether outliers represent actual information or if they are anomalies due to rare occurrences.
- Outliers can skew the perception of learning or performance.
Calculating Outliers using IQR
- The IQR test helps to identify outliers:
- Calculate the first quartile (Q1) and third quartile (Q3).
- Outliers are defined as data points that fall outside the range of:
- Lower Bound: Q1 - 1.5 * IQR
- Upper Bound: Q3 + 1.5 * IQR
- Anything below this lower bound or above this upper bound can be categorized as an outlier.
Example of Outliers and their Impact
- If a significant outlier is included in the average calculations, it can artificially inflate the mean.
- Example:
- Suppose a doctor brings in $1,000,000 worth of business, which misrepresents the average when compared to the usual average of $250,000.
- Inclusion of this outlier raises the average significantly to $500,000, obscuring the true performance.
- Such manipulations in statistical reporting can raise ethical concerns and indicate a problem with data integrity.
Importance of Understanding Outliers and Mean
- When discussing an average (mean), two critical questions should be asked:
- What is the error associated with the mean calculation?
- Are there any outliers affecting the mean?
- The IQR test is a tool available for testing the presence of outliers, which affects the reliability of the calculated mean.
Further Study
- Standard Variance and Standard Deviation: These concepts will be explored more thoroughly in future discussions.
- Recommendation: Read the associated chapter to gain familiarity before calculation methods and deeper analyses are presented in future sessions.