Stats 10_300 Prof Dre
RANGE
Definition: The range is the distance spanned by the data.
Calculation: Range is calculated by subtracting the minimum value from the maximum value.
Formula: Range = Maximum value – Minimum value
Interquartile Range (IQR)
Definition: The IQR is the distance between the first (Q1) and third (Q3) quartile marks.
Purpose: IQR measures the variability of the median and indicates the range of the middle half of the data.
Example: Quartiles and IQR
Quartiles divide a dataset into four equal parts.
Class A: 32 scores (8 scores per quartile)
Class B: 20 scores (5 scores per quartile)
Five-number summary:
Class A: Min: 40, Q1: 71, Q2: 74.5 (Median), Q3: 78.5, Max: 95
Class B: Min: 40, Q1: 61, Q2: 74.5 (Median), Q3: 89, Max: 95
Observations:
Q2 (median) divides the dataset.
Variability:
Class A: Q1 varies by 30 points (40 to 71).
Class B: Q3 varies by 4 points (74.5 to 78.5).
How to Find the IQR
Order the Data: Arrange the dataset in ascending order.
Find Q1 (First Quartile): Median of the lower half of the data.
Find Q3 (Third Quartile): Median of the upper half of the data.
Calculate IQR: IQR = Q3 - Q1
Example Calculation:
Given Five Number Summary: Q1 = 37.5
Identifying Outliers Using IQR
Definition: A point is an outlier if it's substantially above Q3 or below Q1.
Thresholds:
Greater than Q3 + 1.5 × IQR
Less than Q1 - 1.5 × IQR
Example: Outliers
Given: Q1 = 15, Q3 = 18, IQR = 18 - 15 = 3
Lower Bound: Q1 - 1.5 × IQR = 15 - 4.5 = 10.5
Upper Bound: Q3 + 1.5 × IQR = 18 + 4.5 = 22.5
Summary of Outlier Identification
Data point at 10 is an outlier (below 10.5).
Points at 24, 27, and 29 are outliers (above 22.5).
Constructing Boxplots
Concept: Boxplots provide a visual summary of a distribution using the five-number summary.
Components of Boxplots:
Box spans Q1 to Q3
Line at median (Q2)
"Whiskers" extend to the smallest and largest values within 1.5 IQR
Outliers marked with asterisks (*)
Example: Boxplots for Exam Scores
Class A: Min: 40, Q1: 71, Q2: 74.5, Q3: 78.5, Max: 95
Class B: Min: 40, Q1: 61, Q2: 74.5, Q3: 89, Max: 95
Boxplot Interpretation
Long box indicates greater variability (large IQR);
Short box indicates lower variability (small IQR);
Modified boxplots highlight outliers.
Key Insights
Boxplots do not convey:
Number of data points
Distribution pattern within quartiles
Comparison of distributions is best done with side-by-side boxplots.
Measures of Spread
Standard Deviation
Definition: Measures how spread out values are relative to the mean.
Applicability: Useful for symmetric data (bell-shaped distributions).
Population Standard Deviation Formula:
σ = sqrt(Σ(Xi - μ)² / N)
Where:
Xi = each data point,
μ = mean,
N = population size.
Sample Standard Deviation Formula:
s = sqrt(Σ(Xi - X̄)² / (n - 1))
Where:
X̄ = sample mean
n = sample size.
Variance
Definition: Variance is the square of the standard deviation.
Population Variance = σ²
Sample Variance = s²
Key Differences Between Population and Sample
Population data divides by N (total points);
Sample data divides by n - 1 (to correct for sample bias).
Comparing Distributions by Variability
Use multiple metrics (range, IQR, ADM) to analyze variance in data sets:
Class Examples: Potassium content in cereals.
Overall range comparison: Adult cereals > Children’s cereals.
Important Conclusions
Using different variability measures may yield different interpretations of data spread.
The boxplot provides a useful visualization for understanding data distribution and variability concerning the median and outliers.