Study Notes on Boxplots and Outliers

Section 3.3 Boxplots and Outliers

Overview

  • Focuses on understanding boxplots and identifying outliers in data distribution.

Warm-Up Activity

  • Dotplot Analysis: The dotplot displays EPA estimates of highway gas mileage (in miles per gallon, mpg) for 21 randomly selected model 2020 midsize cars.

  • Five-number summary:

    • Minimum value = 25

    • First Quartile (Q1) = 28

    • Median = 30

    • Third Quartile (Q3) = 31

    • Maximum value = 48

  • Outlier Analysis:

    • 48 is confirmed as an outlier.

    • Points 38 and 40 are recognized as potential outliers.

Boxplots and the Five-Number Summary

  • Definition of Five-number Summary: A summary that includes:

    1. Minimum value

    2. First Quartile (Q1)

    3. Median

    4. Third Quartile (Q3)

    5. Maximum value

  • Boxplot Definition: A visual representation of the five-number summary, often illustrating the minimum, Q1, median, Q3, and maximum values.

Creating a Simple Boxplot

  1. Find the Five-Number Summary for the distribution of data.

  2. Draw and Label the Axis: Create a horizontal axis, labeling it with the name of the quantitative variable.

  3. Scale the Axis: Determine the range based on the smallest and largest values:

    • Start at a number equal to or less than the smallest value.

    • Place tick marks at equal intervals until the largest value is included.

  4. Draw the Box: Span from Q1 to Q3.

  5. Mark the Median: Use a vertical line segment equal in height to the box.

  6. Draw Whiskers: Lines extending from the ends of the box to the smallest and largest data values.

Example of Boxplot Creation

  • Dotplot of U.S. Women’s National Soccer Team Goals:

    • Number of goals in 24 games during the 2019 season:

    • Minimum = 1, Q1 = 2.5, Median = 3.5, Q3 = 5, Maximum = 13.

Using a TI-84 to Create a Boxplot

  1. Input Data: Enter values in one list (e.g., L1) and their frequencies in another list (e.g., L2).

  2. Create the Boxplot via Stat Plot:

    • Access via 2nd + y= key.

    • Select a plot (e.g., Plot1).

    • Set to On and choose Simple boxplot from Type.

    • Xlist is L1 & Freq can be L2 or 1 for equal frequencies.

  3. Adjust the Window by pressing window. Ensure:

    • Xmin and Xmax are suitable for the data range.

    • Ymin typically set to -1 and Ymax to 2.

  4. Display Boxplot: Press graph to view.

Determining Outliers

  • Example Overview: Based on another boxplot of soccer goals,

    • IQR (Interquartile Range) measures the dispersion of middle 50% of data.

  • Calculating IQR: Use the formula $ IQR = Q3 - Q1 $ to determine the interquartile range:

    • If, Q1 = 2 and Q3 = 3.5:

    • Thus, IQR = 3.5 - 2 = 1.5

  • 1.5 * IQR Rule: To identify outliers:

    • Low outliers: Values < Q1 - 1.5 imes IQR

    • High outliers: Values > Q3 + 1.5 imes IQR

  • Example Calculation: For Q3 = 3.5,

    • IQR = 3.5 - 2 = 1.5

    • Low outliers: < Q1 - (1.5 imes 1.5) = 2 - 2.25 = -0.25

    • High outliers: > Q3 + (1.5 imes 1.5) = 3.5 + 2.25 = 5.75

Identifying Outliers with the 1.5 * IQR Rule

  • Define an outlier as an observation that lies beyond 1.5 times the IQR:

    • Low outliers: < Q1 - 1.5 imes IQR

    • High outliers: > Q3 + 1.5 imes IQR

Example Application of 1.5 * IQR Rule

  • For Warm-up Data with:

    • Q1 = 28 and Q3 = 31.5,

  • Calculate:

    • IQR using IQR = 31.5 - 28 = 3.5

    • 1.5 imes IQR = 1.5 imes 3.5 = 5.25

    • Low outliers: < 28 - 5.25 = 22.75

    • High outliers: > 31.5 + 5.25 = 36.75

  • Identified Points: Values 38, 40, and 48 classified as outliers.

Creating a Full Boxplot

  1. Find the Five-Number Summary.

  2. Identify Outliers using the 1.5→IQR rule.

  3. Draw and Label the Axis as previously described.

  4. Scale the Axis appropriately.

  5. Draw the Box from Q1 to Q3.

  6. Mark the Median within the box.

  7. Mark Outliers with a special symbol (e.g., asterisk *).

  8. Draw Whiskers from box edges to show non-outlier extremes.

Group Activity on Boxplots and Outliers

  1. Dotplot Analysis: For high temperature readings in Phoenix, Arizona, through July:

    • (a) Provide the five-number summary.

    • (b) Use the 1.5→IQR rule to find any outliers.

    • (c) Construct a boxplot.

Investigation on Overthinking in Golf Putting

  1. Study Overview:

    • 40 experienced golfers practiced putting, then divided into two groups: those detailing technique (risk of overthinking) and those performing an unrelated task.

  2. Results Visualization:

    • Boxplots summarize the distribution of putts taken to achieve three consecutive successful putts.

  3. Discussion Questions:

    • (a) Describe the shape of each distribution (symmetry, skewness, outliers).

    • (b) Compare shapes, centers, and variability of both distributions.

    • (c) Evaluate if findings support the hypothesis regarding overthinking affecting performance.