E

Chapter 6

Chapter 6: Comparing Two Means (Independent Samples)

Course: STAT 1 3: Introduction to Statistical Methods for Life and Health SciencesInstructor: J.H. Sparks Date: Winter 2025


Two-Sample Problems: Quantitative

  • This chapter focuses on comparing two means, an important aspect of statistical inference.

  • Review of previous chapters: confidence intervals, hypothesis tests, inferences about means and proportions.

  • Key Property: Independence or Dependence of Samples

    • Chapter 6 assumes samples are independent.

    • Chapter 7 will explore dependent samples.

  • Goals of Chapter 6:

    • Compare populations and assess normality (Section 6.1+)

    • Compare means via simulation (Section 6.2) and theoretical approaches (Section 6.3)


Section 6.1: Comparing Two Groups: Quantitative Response

  • Importance of checking assumptions about population distributions.

  • Use descriptive statistics and distributions from sample data.

  • Descriptive Measures:

    • Five Number Summary for data visualization.

    • Boxplot construction to assess data symmetry.

    • Normal probability plot to evaluate distribution normality.


Re-Examining Variability & Measures of Relative Standing

  • Definition of the p-th Percentile:

    • A value such that p% of observations fall below it.

  • Review of Chapter 2 concepts:

    • Measures of Central Tendency: mean, median (50th percentile), mode.

    • Measures of Variability: range, variance, standard deviation.

    • Introduction of Quartiles:

      • 25th percentile: Q1 (Lower quartile)

      • 50th percentile: Median (M)

      • 75th percentile: Q3 (Upper quartile)


Finding the Quartiles

  • For an odd number of observations, the median is the middle value in an ordered set.

  • For an even number of observations, the median is the average of the two middle values.

  • Methodology for locating quartiles:

    1. For odd count: omit the median when locating Q1 and Q3.

    2. For even count: include all observations when locating Q1 and Q3.

  • Note: Definitions of quartiles may vary by statistician or software.


IQR & the Five-Number Summary

  • Interquartile Range (IQR): Difference between Q3 and Q1, IQR = Q3 - Q1

  • Preferred measure of variation with median as the center.

  • The five-number summary includes:

    • Minimum

    • Q1

    • Median (M)

    • Q3

    • Maximum

  • IQR provides insight into variation in data across the four quarters.


Detecting Outliers

  • Use the five-number summary and IQR to identify outliers.

  • Inner Fences:

    • Lower Inner Fence: Q1 - 1.5 * IQR

    • Upper Inner Fence: Q3 + 1.5 * IQR

  • Observations outside these fences are potential outliers.

  • Outer Fences:

    • Lower Outer Fence: Q1 - 3 * IQR

    • Upper Outer Fence: Q3 + 3 * IQR

  • Adjacent values are the most extreme that are still within the inner fences.


Boxplots

  • Boxplots visualize quantitative data.

  • Created by constructing a box at Q1 and Q3, with median shown as a dividing line.

  • Whiskers extend to adjacent values and dots indicate potential outliers.

  • Developed by Professor John Tukey (also introduced the stem-and-leaf plot).


Comparisons and Caveats

  • Boxplots allow for quick comparisons of five-number summaries across groups.

  • Limitations:

    • May lose detailed shape of distribution.

    • Cannot identify multimodal distributions or clusters.

  • Recommended to combine with dotplots or histograms for small data sets.


Examining Normality

  • Importance of checking normality for valid statistical techniques.

  • Procedures for checking:

    1. Histogram should be bell-shaped for large samples.

    2. Compute intervals around mean and evaluate percentage of data in each.


Normal Probability Plot

  • A normal probability plot (Q-Q plot) evaluates normality by comparing observed and expected values.

  • A roughly linear plot indicates normal distribution; deviations suggest otherwise.


Section 6.2: Comparing Two Groups: Quantitative Response

  • This section investigates differences between two samples.

  • Hypotheses:

    • Null Hypothesis (H0): No association between treatment and group.

    • Alternative Hypothesis (H1): There is an association between treatment and group.


Example: Alcohol's Effect on Reaction Times

  • Investigated the effect of alcohol on driving reaction times with two groups (alcohol vs placebo).

  • Aim: To determine if alcohol affects average reaction times.

  • 95% confidence interval will be constructed for the difference in reaction times.


Constructing a Test Statistic

  • Method involves shuffling data to simulate random reassignments of treatments.

  • Aim: Determine if means differ significantly.


Confidence Interval

  • The sampling distribution gives mean and standard deviation for the difference in means.


Section 6.3: Comparing Two Means: Theoretical Approach

  • Transition from simulation to theoretical method for mean comparisons.

  • Point estimate for difference between population means using sample means.


Validity Conditions for the Two-Independent Sample t-Procedure

  • Conditions for conducting statistical inference include:

    1. Large samples (n1 ≥ 30, n2 ≥ 30) for normal approximation.

    2. About normal or symmetric distributions for smaller samples.

    3. Robust conditions for 20+ observations if distributions are not skewed.


Equal vs. Unequal Variances

  • Discussion of procedures for small samples and equal/unequal variances.

  • Special procedures apply based on variance equality:

    • Pooled estimates used for equal variances.

    • Satterthwaite approximation for unequal variances.


Confidence Interval & Relation to Hypothesis Testing

  • Establish the relationship between confidence intervals and hypothesis tests.


R Output and Final Notes

  • Use of R for hypothesis tests and confidence intervals.

  • Implementation of non-parametric methods such as Wilcoxon Rank Sum Test and Kolmogorov-Smirnov test as alternatives.