Course: STAT 1 3: Introduction to Statistical Methods for Life and Health SciencesInstructor: J.H. Sparks Date: Winter 2025
This chapter focuses on comparing two means, an important aspect of statistical inference.
Review of previous chapters: confidence intervals, hypothesis tests, inferences about means and proportions.
Key Property: Independence or Dependence of Samples
Chapter 6 assumes samples are independent.
Chapter 7 will explore dependent samples.
Goals of Chapter 6:
Compare populations and assess normality (Section 6.1+)
Compare means via simulation (Section 6.2) and theoretical approaches (Section 6.3)
Importance of checking assumptions about population distributions.
Use descriptive statistics and distributions from sample data.
Descriptive Measures:
Five Number Summary for data visualization.
Boxplot construction to assess data symmetry.
Normal probability plot to evaluate distribution normality.
Definition of the p-th Percentile:
A value such that p% of observations fall below it.
Review of Chapter 2 concepts:
Measures of Central Tendency: mean, median (50th percentile), mode.
Measures of Variability: range, variance, standard deviation.
Introduction of Quartiles:
25th percentile: Q1 (Lower quartile)
50th percentile: Median (M)
75th percentile: Q3 (Upper quartile)
For an odd number of observations, the median is the middle value in an ordered set.
For an even number of observations, the median is the average of the two middle values.
Methodology for locating quartiles:
For odd count: omit the median when locating Q1 and Q3.
For even count: include all observations when locating Q1 and Q3.
Note: Definitions of quartiles may vary by statistician or software.
Interquartile Range (IQR): Difference between Q3 and Q1, IQR = Q3 - Q1
Preferred measure of variation with median as the center.
The five-number summary includes:
Minimum
Q1
Median (M)
Q3
Maximum
IQR provides insight into variation in data across the four quarters.
Use the five-number summary and IQR to identify outliers.
Inner Fences:
Lower Inner Fence: Q1 - 1.5 * IQR
Upper Inner Fence: Q3 + 1.5 * IQR
Observations outside these fences are potential outliers.
Outer Fences:
Lower Outer Fence: Q1 - 3 * IQR
Upper Outer Fence: Q3 + 3 * IQR
Adjacent values are the most extreme that are still within the inner fences.
Boxplots visualize quantitative data.
Created by constructing a box at Q1 and Q3, with median shown as a dividing line.
Whiskers extend to adjacent values and dots indicate potential outliers.
Developed by Professor John Tukey (also introduced the stem-and-leaf plot).
Boxplots allow for quick comparisons of five-number summaries across groups.
Limitations:
May lose detailed shape of distribution.
Cannot identify multimodal distributions or clusters.
Recommended to combine with dotplots or histograms for small data sets.
Importance of checking normality for valid statistical techniques.
Procedures for checking:
Histogram should be bell-shaped for large samples.
Compute intervals around mean and evaluate percentage of data in each.
A normal probability plot (Q-Q plot) evaluates normality by comparing observed and expected values.
A roughly linear plot indicates normal distribution; deviations suggest otherwise.
This section investigates differences between two samples.
Hypotheses:
Null Hypothesis (H0): No association between treatment and group.
Alternative Hypothesis (H1): There is an association between treatment and group.
Investigated the effect of alcohol on driving reaction times with two groups (alcohol vs placebo).
Aim: To determine if alcohol affects average reaction times.
95% confidence interval will be constructed for the difference in reaction times.
Method involves shuffling data to simulate random reassignments of treatments.
Aim: Determine if means differ significantly.
The sampling distribution gives mean and standard deviation for the difference in means.
Transition from simulation to theoretical method for mean comparisons.
Point estimate for difference between population means using sample means.
Conditions for conducting statistical inference include:
Large samples (n1 ≥ 30, n2 ≥ 30) for normal approximation.
About normal or symmetric distributions for smaller samples.
Robust conditions for 20+ observations if distributions are not skewed.
Discussion of procedures for small samples and equal/unequal variances.
Special procedures apply based on variance equality:
Pooled estimates used for equal variances.
Satterthwaite approximation for unequal variances.
Establish the relationship between confidence intervals and hypothesis tests.
Use of R for hypothesis tests and confidence intervals.
Implementation of non-parametric methods such as Wilcoxon Rank Sum Test and Kolmogorov-Smirnov test as alternatives.