Chapter 6
Chapter 6: Comparing Two Means (Independent Samples)
Course: STAT 1 3: Introduction to Statistical Methods for Life and Health SciencesInstructor: J.H. Sparks Date: Winter 2025
Two-Sample Problems: Quantitative
This chapter focuses on comparing two means, an important aspect of statistical inference.
Review of previous chapters: confidence intervals, hypothesis tests, inferences about means and proportions.
Key Property: Independence or Dependence of Samples
Chapter 6 assumes samples are independent.
Chapter 7 will explore dependent samples.
Goals of Chapter 6:
Compare populations and assess normality (Section 6.1+)
Compare means via simulation (Section 6.2) and theoretical approaches (Section 6.3)
Section 6.1: Comparing Two Groups: Quantitative Response
Importance of checking assumptions about population distributions.
Use descriptive statistics and distributions from sample data.
Descriptive Measures:
Five Number Summary for data visualization.
Boxplot construction to assess data symmetry.
Normal probability plot to evaluate distribution normality.
Re-Examining Variability & Measures of Relative Standing
Definition of the p-th Percentile:
A value such that p% of observations fall below it.
Review of Chapter 2 concepts:
Measures of Central Tendency: mean, median (50th percentile), mode.
Measures of Variability: range, variance, standard deviation.
Introduction of Quartiles:
25th percentile: Q1 (Lower quartile)
50th percentile: Median (M)
75th percentile: Q3 (Upper quartile)
Finding the Quartiles
For an odd number of observations, the median is the middle value in an ordered set.
For an even number of observations, the median is the average of the two middle values.
Methodology for locating quartiles:
For odd count: omit the median when locating Q1 and Q3.
For even count: include all observations when locating Q1 and Q3.
Note: Definitions of quartiles may vary by statistician or software.
IQR & the Five-Number Summary
Interquartile Range (IQR): Difference between Q3 and Q1, IQR = Q3 - Q1
Preferred measure of variation with median as the center.
The five-number summary includes:
Minimum
Q1
Median (M)
Q3
Maximum
IQR provides insight into variation in data across the four quarters.
Detecting Outliers
Use the five-number summary and IQR to identify outliers.
Inner Fences:
Lower Inner Fence: Q1 - 1.5 * IQR
Upper Inner Fence: Q3 + 1.5 * IQR
Observations outside these fences are potential outliers.
Outer Fences:
Lower Outer Fence: Q1 - 3 * IQR
Upper Outer Fence: Q3 + 3 * IQR
Adjacent values are the most extreme that are still within the inner fences.
Boxplots
Boxplots visualize quantitative data.
Created by constructing a box at Q1 and Q3, with median shown as a dividing line.
Whiskers extend to adjacent values and dots indicate potential outliers.
Developed by Professor John Tukey (also introduced the stem-and-leaf plot).
Comparisons and Caveats
Boxplots allow for quick comparisons of five-number summaries across groups.
Limitations:
May lose detailed shape of distribution.
Cannot identify multimodal distributions or clusters.
Recommended to combine with dotplots or histograms for small data sets.
Examining Normality
Importance of checking normality for valid statistical techniques.
Procedures for checking:
Histogram should be bell-shaped for large samples.
Compute intervals around mean and evaluate percentage of data in each.
Normal Probability Plot
A normal probability plot (Q-Q plot) evaluates normality by comparing observed and expected values.
A roughly linear plot indicates normal distribution; deviations suggest otherwise.
Section 6.2: Comparing Two Groups: Quantitative Response
This section investigates differences between two samples.
Hypotheses:
Null Hypothesis (H0): No association between treatment and group.
Alternative Hypothesis (H1): There is an association between treatment and group.
Example: Alcohol's Effect on Reaction Times
Investigated the effect of alcohol on driving reaction times with two groups (alcohol vs placebo).
Aim: To determine if alcohol affects average reaction times.
95% confidence interval will be constructed for the difference in reaction times.
Constructing a Test Statistic
Method involves shuffling data to simulate random reassignments of treatments.
Aim: Determine if means differ significantly.
Confidence Interval
The sampling distribution gives mean and standard deviation for the difference in means.
Section 6.3: Comparing Two Means: Theoretical Approach
Transition from simulation to theoretical method for mean comparisons.
Point estimate for difference between population means using sample means.
Validity Conditions for the Two-Independent Sample t-Procedure
Conditions for conducting statistical inference include:
Large samples (n1 ≥ 30, n2 ≥ 30) for normal approximation.
About normal or symmetric distributions for smaller samples.
Robust conditions for 20+ observations if distributions are not skewed.
Equal vs. Unequal Variances
Discussion of procedures for small samples and equal/unequal variances.
Special procedures apply based on variance equality:
Pooled estimates used for equal variances.
Satterthwaite approximation for unequal variances.
Confidence Interval & Relation to Hypothesis Testing
Establish the relationship between confidence intervals and hypothesis tests.
R Output and Final Notes
Use of R for hypothesis tests and confidence intervals.
Implementation of non-parametric methods such as Wilcoxon Rank Sum Test and Kolmogorov-Smirnov test as alternatives.