Section+2.4+Boxplots+%26+Quantitative+Categorical+Relationships+%28Student+Version%29

Section 2.4: Boxplots and Quantitative/Categorical Relationships

Overview

  • Boxplots are visual representations that summarize data related to one quantitative variable or compare a quantitative variable across categorical variables.

Boxplots for One Quantitative Variable

  • A boxplot provides a graphical display of the five-number summary for a single quantitative variable.

  • The five-number summary consists of the minimum, first quartile (Q1), median (Q2), third quartile (Q3), and maximum.

  • Boxplots help identify outliers, which are values that are significantly higher or lower than the rest of the data.

Outliers in Boxplots

  • Definitions of Outliers:

    • A data value is considered an outlier if:

      • Smaller than: Q1 - 1.5(IQR)

      • Larger than: Q3 + 1.5(IQR)

  • To find IQR (Interquartile Range):

    • IQR = Q3 - Q1

Example Applications

Example 1: Population of U.S. States

  • Data: Population in millions of the 50 U.S. States.

  • Five-number summary: (0.506, 1.660, 4.170, 6.676, 35.842)

  • Calculating IQR and Outliers:

    • Find Q1 and Q3.

    • Calculate IQR.

    • Determine lower and upper bounds for outliers.

Example 2: Hot Dog Eating Contest

  • Boxplot Analysis:

    • Estimate median hot dogs eaten.

    • Estimate IQR and range.

    • Identify potential outliers from the boxplot.

Example 3: Gross State Product (GSP)

  • Data Representation: GSP per capita for 50 U.S. States in dollars.

  • Analyze for outliers, skewness, and the relationship between mean and median.

Quantitative and Categorical Relationships

  • Investigate how a quantitative variable is distributed across categorical groups.

    • Example Questions:

      • TV watching habits by sex.

      • Survival time related to genetic variation.

      • Temperature differences between cities.

Utilizing Side-by-Side Graphs

  • Side-by-side boxplots visualize relationships between a quantitative variable and several categorical groups.

  • Each categorical group has a boxplot that conveys measures of central tendency and variability.

Examples of Side-by-Side Graphs

Example 4: Smoking Rates by Region

  • Comparison of Regions:

    • Identify regions with the highest smoking rates, greatest variability, and any outliers.

Example 5: Tea vs. Coffee Study

  • Study Objective: Measure immune responses after consumption of tea or coffee.

  • Investigate potential relationships between the categorical (tea or coffee) and quantitative (immune response) variables.

  • Descriptive statistics should include differences in means for each group, and concluding implications regarding causal relationships.