Section+2.4+Boxplots+%26+Quantitative+Categorical+Relationships+%28Student+Version%29
Section 2.4: Boxplots and Quantitative/Categorical Relationships
Overview
Boxplots are visual representations that summarize data related to one quantitative variable or compare a quantitative variable across categorical variables.
Boxplots for One Quantitative Variable
A boxplot provides a graphical display of the five-number summary for a single quantitative variable.
The five-number summary consists of the minimum, first quartile (Q1), median (Q2), third quartile (Q3), and maximum.
Boxplots help identify outliers, which are values that are significantly higher or lower than the rest of the data.
Outliers in Boxplots
Definitions of Outliers:
A data value is considered an outlier if:
Smaller than: Q1 - 1.5(IQR)
Larger than: Q3 + 1.5(IQR)
To find IQR (Interquartile Range):
IQR = Q3 - Q1
Example Applications
Example 1: Population of U.S. States
Data: Population in millions of the 50 U.S. States.
Five-number summary: (0.506, 1.660, 4.170, 6.676, 35.842)
Calculating IQR and Outliers:
Find Q1 and Q3.
Calculate IQR.
Determine lower and upper bounds for outliers.
Example 2: Hot Dog Eating Contest
Boxplot Analysis:
Estimate median hot dogs eaten.
Estimate IQR and range.
Identify potential outliers from the boxplot.
Example 3: Gross State Product (GSP)
Data Representation: GSP per capita for 50 U.S. States in dollars.
Analyze for outliers, skewness, and the relationship between mean and median.
Quantitative and Categorical Relationships
Investigate how a quantitative variable is distributed across categorical groups.
Example Questions:
TV watching habits by sex.
Survival time related to genetic variation.
Temperature differences between cities.
Utilizing Side-by-Side Graphs
Side-by-side boxplots visualize relationships between a quantitative variable and several categorical groups.
Each categorical group has a boxplot that conveys measures of central tendency and variability.
Examples of Side-by-Side Graphs
Example 4: Smoking Rates by Region
Comparison of Regions:
Identify regions with the highest smoking rates, greatest variability, and any outliers.
Example 5: Tea vs. Coffee Study
Study Objective: Measure immune responses after consumption of tea or coffee.
Investigate potential relationships between the categorical (tea or coffee) and quantitative (immune response) variables.
Descriptive statistics should include differences in means for each group, and concluding implications regarding causal relationships.