Study Notes on Boxplots and Outliers
Section 3.3 Boxplots and Outliers
Overview
Focuses on understanding boxplots and identifying outliers in data distribution.
Warm-Up Activity
Dotplot Analysis: The dotplot displays EPA estimates of highway gas mileage (in miles per gallon, mpg) for 21 randomly selected model 2020 midsize cars.
Five-number summary:
Minimum value = 25
First Quartile (Q1) = 28
Median = 30
Third Quartile (Q3) = 31
Maximum value = 48
Outlier Analysis:
48 is confirmed as an outlier.
Points 38 and 40 are recognized as potential outliers.
Boxplots and the Five-Number Summary
Definition of Five-number Summary: A summary that includes:
Minimum value
First Quartile (Q1)
Median
Third Quartile (Q3)
Maximum value
Boxplot Definition: A visual representation of the five-number summary, often illustrating the minimum, Q1, median, Q3, and maximum values.
Creating a Simple Boxplot
Find the Five-Number Summary for the distribution of data.
Draw and Label the Axis: Create a horizontal axis, labeling it with the name of the quantitative variable.
Scale the Axis: Determine the range based on the smallest and largest values:
Start at a number equal to or less than the smallest value.
Place tick marks at equal intervals until the largest value is included.
Draw the Box: Span from Q1 to Q3.
Mark the Median: Use a vertical line segment equal in height to the box.
Draw Whiskers: Lines extending from the ends of the box to the smallest and largest data values.
Example of Boxplot Creation
Dotplot of U.S. Women’s National Soccer Team Goals:
Number of goals in 24 games during the 2019 season:
Minimum = 1, Q1 = 2.5, Median = 3.5, Q3 = 5, Maximum = 13.
Using a TI-84 to Create a Boxplot
Input Data: Enter values in one list (e.g., L1) and their frequencies in another list (e.g., L2).
Create the Boxplot via Stat Plot:
Access via
2nd+y=key.Select a plot (e.g., Plot1).
Set to On and choose Simple boxplot from Type.
Xlist is L1 & Freq can be L2 or 1 for equal frequencies.
Adjust the Window by pressing
window. Ensure:Xmin and Xmax are suitable for the data range.
Ymin typically set to -1 and Ymax to 2.
Display Boxplot: Press
graphto view.
Determining Outliers
Example Overview: Based on another boxplot of soccer goals,
IQR (Interquartile Range) measures the dispersion of middle 50% of data.
Calculating IQR: Use the formula $ IQR = Q3 - Q1 $ to determine the interquartile range:
If, Q1 = 2 and Q3 = 3.5:
Thus, IQR = 3.5 - 2 = 1.5
1.5 * IQR Rule: To identify outliers:
Low outliers: Values < Q1 - 1.5 imes IQR
High outliers: Values > Q3 + 1.5 imes IQR
Example Calculation: For Q3 = 3.5,
IQR = 3.5 - 2 = 1.5
Low outliers: < Q1 - (1.5 imes 1.5) = 2 - 2.25 = -0.25
High outliers: > Q3 + (1.5 imes 1.5) = 3.5 + 2.25 = 5.75
Identifying Outliers with the 1.5 * IQR Rule
Define an outlier as an observation that lies beyond 1.5 times the IQR:
Low outliers: < Q1 - 1.5 imes IQR
High outliers: > Q3 + 1.5 imes IQR
Example Application of 1.5 * IQR Rule
For Warm-up Data with:
Q1 = 28 and Q3 = 31.5,
Calculate:
IQR using IQR = 31.5 - 28 = 3.5
1.5 imes IQR = 1.5 imes 3.5 = 5.25
Low outliers: < 28 - 5.25 = 22.75
High outliers: > 31.5 + 5.25 = 36.75
Identified Points: Values 38, 40, and 48 classified as outliers.
Creating a Full Boxplot
Find the Five-Number Summary.
Identify Outliers using the 1.5→IQR rule.
Draw and Label the Axis as previously described.
Scale the Axis appropriately.
Draw the Box from Q1 to Q3.
Mark the Median within the box.
Mark Outliers with a special symbol (e.g., asterisk *).
Draw Whiskers from box edges to show non-outlier extremes.
Group Activity on Boxplots and Outliers
Dotplot Analysis: For high temperature readings in Phoenix, Arizona, through July:
(a) Provide the five-number summary.
(b) Use the 1.5→IQR rule to find any outliers.
(c) Construct a boxplot.
Investigation on Overthinking in Golf Putting
Study Overview:
40 experienced golfers practiced putting, then divided into two groups: those detailing technique (risk of overthinking) and those performing an unrelated task.
Results Visualization:
Boxplots summarize the distribution of putts taken to achieve three consecutive successful putts.
Discussion Questions:
(a) Describe the shape of each distribution (symmetry, skewness, outliers).
(b) Compare shapes, centers, and variability of both distributions.
(c) Evaluate if findings support the hypothesis regarding overthinking affecting performance.