In-Depth Notes on Hypothesis Test for Paired Data
Paired Samples and Blocks
Definition of Paired Data:
- Observations are collected in pairs, or observations in one group are related to the other.
- Common in studies comparing subjects before and after a treatment, leading to a type of blocking.
Types of Pairing:
- Experimental Pairing: Arises from experiments (type of blocking).
- Observational Pairing: Arises from matching (form of matching).
Importance of Pairing:
- If data is known to be paired, it should be considered in analysis to evaluate pairwise differences.
- No formal test exists to determine pairing; it relies on understanding data collection context.
Analyzing Paired Data
- Example:
- Measuring resting pulse and pulse after exercise on the same individuals.
- Key Metric:
- We focus on the difference in population means: ( \mud = \mu1 - \mu_2 )
- Sample of Differences:
- For each pair: Sample 1 value - Sample 2 value.
- Generates a single set of differences:
- For pair 1: ( y{d1} = y{11} - y_{21} )
- Continue for all pairs up to ( n ) pairs.
Paired t-Test Overview
Single Set of Data:
- Use a one-sample t-test on pairwise differences.
- Sample size ( n ) is the number of pairs.
Key Notation:
- ( \bar{d} ): Mean of pairwise differences.
- ( s_d ): Standard deviation of pairwise differences.
- ( n ): Number of pairs.
Assumptions for Paired t-Test
- Paired Data Assumption:
- The data must be paired.
- Independence Assumption:
- Differences must be independent.
- Randomization Condition:
- Identifies randomness in data collection.
- 10% Condition:
- Applies to small sample sizes; often ignored.
- Normal Population Assumption:
- Differences follow a Normal model; validated using histogram/normal probability plot.
Hypothesis Testing Steps
Hypothesis:
- Null: ( H0: \mud = 0 )
- Alternative: ( Ha: \mud < 0 )
Test Statistic:
- Given by ( t0 = \frac{\bar{d} - d0}{s_d / \sqrt{n}} )
- Follows t-distribution with ( df = n - 1 )
P-value:
- Calculated from the test statistic.
Decision:
- If ( p \leq \alpha ) → Reject ( H_0 )
- If ( p > \alpha ) → Do not reject ( H_0 )
Confidence Interval for ( \mu_d )
- Assumptions:
- Same as paired t-test assumptions.
- Confidence Interval Formula:
- ( CI = \text{point estimate} \pm ME )
- ( ME = CV \times SE )
- ( CI for \mud : \bar{d} \pm t^* \frac{sd}{\sqrt{n}} )
- where ( t^* ) is the critical t-value.
Practical Example
Study on lactic acid levels before and after racquetball:
- Sample size: 8 (measured before and after exercise)
- Result to test: Mean lactate level before < mean lactate level after.
- Calculated difference: ( d = y{before} - y{after} )
- Found mean ( \bar{d} = -13.63, s_d = 8.28 )
Hypothesis:
- Null: ( H0: \mud = 0 )
- Alternative: ( H0: \mud < 0 )
Test Statistic Computation:
- ( t_0 = \frac{-13.63 - 0}{\frac{8.28}{\sqrt{8}}} \approx -4.656 )
Conclusion:
- P-value < 0.005, hence reject ( H_0 ), indicating mean lactate levels before exercise are lower.
Confidence Interval Estimation
- 90% Confidence Interval for ( \mu_d ):
- Calculation: ( \bar{d} \pm t^* \frac{s_d}{\sqrt{n}} )
- Result: ( (-19.177, -8.083) )
- Interpretation: Mean lactate before exercise is between 8.083 and 19.177 lower than after exercise.