Stats 10_300 Prof Dre

RANGE

  • Definition: The range is the distance spanned by the data.

  • Calculation: Range is calculated by subtracting the minimum value from the maximum value.

    • Formula: Range = Maximum value – Minimum value

Interquartile Range (IQR)

  • Definition: The IQR is the distance between the first (Q1) and third (Q3) quartile marks.

  • Purpose: IQR measures the variability of the median and indicates the range of the middle half of the data.

Example: Quartiles and IQR

  • Quartiles divide a dataset into four equal parts.

    • Class A: 32 scores (8 scores per quartile)

    • Class B: 20 scores (5 scores per quartile)

  • Five-number summary:

    • Class A: Min: 40, Q1: 71, Q2: 74.5 (Median), Q3: 78.5, Max: 95

    • Class B: Min: 40, Q1: 61, Q2: 74.5 (Median), Q3: 89, Max: 95

  • Observations:

    • Q2 (median) divides the dataset.

    • Variability:

      • Class A: Q1 varies by 30 points (40 to 71).

      • Class B: Q3 varies by 4 points (74.5 to 78.5).

How to Find the IQR

  1. Order the Data: Arrange the dataset in ascending order.

  2. Find Q1 (First Quartile): Median of the lower half of the data.

  3. Find Q3 (Third Quartile): Median of the upper half of the data.

  4. Calculate IQR: IQR = Q3 - Q1

Example Calculation:

  • Given Five Number Summary: Q1 = 37.5

Identifying Outliers Using IQR

  • Definition: A point is an outlier if it's substantially above Q3 or below Q1.

  • Thresholds:

    • Greater than Q3 + 1.5 × IQR

    • Less than Q1 - 1.5 × IQR

Example: Outliers

  • Given: Q1 = 15, Q3 = 18, IQR = 18 - 15 = 3

    • Lower Bound: Q1 - 1.5 × IQR = 15 - 4.5 = 10.5

    • Upper Bound: Q3 + 1.5 × IQR = 18 + 4.5 = 22.5

Summary of Outlier Identification

  • Data point at 10 is an outlier (below 10.5).

  • Points at 24, 27, and 29 are outliers (above 22.5).

Constructing Boxplots

  • Concept: Boxplots provide a visual summary of a distribution using the five-number summary.

  • Components of Boxplots:

    • Box spans Q1 to Q3

    • Line at median (Q2)

    • "Whiskers" extend to the smallest and largest values within 1.5 IQR

    • Outliers marked with asterisks (*)

Example: Boxplots for Exam Scores

  • Class A: Min: 40, Q1: 71, Q2: 74.5, Q3: 78.5, Max: 95

  • Class B: Min: 40, Q1: 61, Q2: 74.5, Q3: 89, Max: 95

Boxplot Interpretation

  • Long box indicates greater variability (large IQR);

  • Short box indicates lower variability (small IQR);

  • Modified boxplots highlight outliers.

Key Insights

  • Boxplots do not convey:

    • Number of data points

    • Distribution pattern within quartiles

  • Comparison of distributions is best done with side-by-side boxplots.

Measures of Spread

Standard Deviation

  • Definition: Measures how spread out values are relative to the mean.

  • Applicability: Useful for symmetric data (bell-shaped distributions).

Population Standard Deviation Formula:

  • σ = sqrt(Σ(Xi - μ)² / N)

  • Where:

    • Xi = each data point,

    • μ = mean,

    • N = population size.

Sample Standard Deviation Formula:

  • s = sqrt(Σ(Xi - X̄)² / (n - 1))

  • Where:

    • X̄ = sample mean

    • n = sample size.

Variance

  • Definition: Variance is the square of the standard deviation.

    • Population Variance = σ²

    • Sample Variance = s²

Key Differences Between Population and Sample

  • Population data divides by N (total points);

  • Sample data divides by n - 1 (to correct for sample bias).

Comparing Distributions by Variability

  • Use multiple metrics (range, IQR, ADM) to analyze variance in data sets:

    • Class Examples: Potassium content in cereals.

    • Overall range comparison: Adult cereals > Children’s cereals.

Important Conclusions

  • Using different variability measures may yield different interpretations of data spread.

  • The boxplot provides a useful visualization for understanding data distribution and variability concerning the median and outliers.