In-Depth Notes on Hypothesis Test for Paired Data

Definition of Paired Data:
- Observations are collected in pairs, or observations in one group are related to the other.
- Common in studies comparing subjects before and after a treatment, leading to a type of blocking.
Types of Pairing:
1. Experimental Pairing: Arises from experiments (type of blocking).
2. Observational Pairing: Arises from matching (form of matching).
Importance of Pairing:
- If data is known to be paired, it should be considered in analysis to evaluate pairwise differences.
- No formal test exists to determine pairing; it relies on understanding data collection context.

Example:
- Measuring resting pulse and pulse after exercise on the same individuals.
Key Metric:
- We focus on the difference in population means: ( \mud = \mu1 - \mu_2 )
Sample of Differences:
- For each pair: Sample 1 value - Sample 2 value.
- Generates a single set of differences:
- For pair 1: ( y{d1} = y{11} - y_{21} )
- Continue for all pairs up to ( n ) pairs.

Single Set of Data:
- Use a one-sample t-test on pairwise differences.
- Sample size ( n ) is the number of pairs.
Key Notation:
- ( \bar{d} ): Mean of pairwise differences.
- ( s_d ): Standard deviation of pairwise differences.
- ( n ): Number of pairs.

Paired Data Assumption:
- The data must be paired.
Independence Assumption:
- Differences must be independent.
Randomization Condition:
- Identifies randomness in data collection.
10% Condition:
- Applies to small sample sizes; often ignored.
Normal Population Assumption:
- Differences follow a Normal model; validated using histogram/normal probability plot.

Hypothesis:
- Null: ( H0: \mud = 0 )
- Alternative: ( Ha: \mud < 0 )
Test Statistic:
- Given by ( t0 = \frac{\bar{d} - d0}{s_d / \sqrt{n}} )
- Follows t-distribution with ( df = n - 1 )
P-value:
- Calculated from the test statistic.
Decision:
- If ( p \leq \alpha ) → Reject ( H_0 )
- If ( p > \alpha ) → Do not reject ( H_0 )

Assumptions:
- Same as paired t-test assumptions.
Confidence Interval Formula:
- ( CI = \text{point estimate} \pm ME )
- ( ME = CV \times SE )
- ( CI for \mud : \bar{d} \pm t^* \frac{sd}{\sqrt{n}} )
- where ( t^* ) is the critical t-value.

Study on lactic acid levels before and after racquetball:
- Sample size: 8 (measured before and after exercise)
- Result to test: Mean lactate level before < mean lactate level after.
- Calculated difference: ( d = y{before} - y{after} )
- Found mean ( \bar{d} = -13.63, s_d = 8.28 )
Hypothesis:
- Null: ( H0: \mud = 0 )
- Alternative: ( H0: \mud < 0 )
Test Statistic Computation:
- ( t_0 = \frac{-13.63 - 0}{\frac{8.28}{\sqrt{8}}} \approx -4.656 )
Conclusion:
- P-value < 0.005, hence reject ( H_0 ), indicating mean lactate levels before exercise are lower.

90% Confidence Interval for ( \mu_d ):
- Calculation: ( \bar{d} \pm t^* \frac{s_d}{\sqrt{n}} )
- Result: ( (-19.177, -8.083) )
- Interpretation: Mean lactate before exercise is between 8.083 and 19.177 lower than after exercise.