Statistics: Range, IQR, Outliers, and Empirical Rule

Range

  • The range of a data set is the difference between the maximum and minimum values.
  • Example:
    • Given data points: 1 and 12.
    • Range calculation:
    • Maximum = 12
    • Minimum = 1
    • Range = Maximum - Minimum = 12 - 1 = 11.
  • The calculation can be "gut-checked" by adding the minimum to the range to see if it equals the maximum.

Interquartile Range (IQR)

  • Not covering the calculation at this moment but introducing the concept visually using a normal curve.
  • In a normal distribution, the mean is represented at the center, typically at 0.
  • Range of scores is indicated on either side of the mean.

Empirical Rule

  • The Empirical Rule describes how data is distributed in a normal distribution:
    • 68% of the data falls within one standard deviation of the mean.
    • 95% of the data falls within two standard deviations of the mean.
    • 99.7% of the data falls within three standard deviations of the mean.
  • Illustratively:
    • One standard deviation: from -1 to +1
    • Two standard deviations: from -2 to +2
    • Three standard deviations: from -3 to +3

Analysis of Outliers

  • Discussion of extreme data points:
    • If a data point is very far from the mean (e.g., five standard deviations), it may be considered an outlier.
    • Example: If everyone in a class averages an 85, but one person scores a 15, this score may be categorized as an outlier.
    • The underlying question is whether outliers represent actual information or if they are anomalies due to rare occurrences.
    • Outliers can skew the perception of learning or performance.

Calculating Outliers using IQR

  • The IQR test helps to identify outliers:
    • Calculate the first quartile (Q1) and third quartile (Q3).
    • Outliers are defined as data points that fall outside the range of:
    • Lower Bound: Q1 - 1.5 * IQR
    • Upper Bound: Q3 + 1.5 * IQR
    • Anything below this lower bound or above this upper bound can be categorized as an outlier.

Example of Outliers and their Impact

  • If a significant outlier is included in the average calculations, it can artificially inflate the mean.
    • Example:
    • Suppose a doctor brings in $1,000,000 worth of business, which misrepresents the average when compared to the usual average of $250,000.
    • Inclusion of this outlier raises the average significantly to $500,000, obscuring the true performance.
  • Such manipulations in statistical reporting can raise ethical concerns and indicate a problem with data integrity.

Importance of Understanding Outliers and Mean

  • When discussing an average (mean), two critical questions should be asked:
    • What is the error associated with the mean calculation?
    • Are there any outliers affecting the mean?
  • The IQR test is a tool available for testing the presence of outliers, which affects the reliability of the calculated mean.

Further Study

  • Standard Variance and Standard Deviation: These concepts will be explored more thoroughly in future discussions.
  • Recommendation: Read the associated chapter to gain familiarity before calculation methods and deeper analyses are presented in future sessions.