In-Depth Notes on Hypothesis Test for Paired Data

Paired Samples and Blocks

  • Definition of Paired Data:

    • Observations are collected in pairs, or observations in one group are related to the other.
    • Common in studies comparing subjects before and after a treatment, leading to a type of blocking.
  • Types of Pairing:

    1. Experimental Pairing: Arises from experiments (type of blocking).
    2. Observational Pairing: Arises from matching (form of matching).
  • Importance of Pairing:

    • If data is known to be paired, it should be considered in analysis to evaluate pairwise differences.
    • No formal test exists to determine pairing; it relies on understanding data collection context.

Analyzing Paired Data

  • Example:
    • Measuring resting pulse and pulse after exercise on the same individuals.
  • Key Metric:
    • We focus on the difference in population means: ( \mud = \mu1 - \mu_2 )
  • Sample of Differences:
    • For each pair: Sample 1 value - Sample 2 value.
    • Generates a single set of differences:
    • For pair 1: ( y{d1} = y{11} - y_{21} )
    • Continue for all pairs up to ( n ) pairs.

Paired t-Test Overview

  • Single Set of Data:

    • Use a one-sample t-test on pairwise differences.
    • Sample size ( n ) is the number of pairs.
  • Key Notation:

    • ( \bar{d} ): Mean of pairwise differences.
    • ( s_d ): Standard deviation of pairwise differences.
    • ( n ): Number of pairs.

Assumptions for Paired t-Test

  1. Paired Data Assumption:
    • The data must be paired.
  2. Independence Assumption:
    • Differences must be independent.
  3. Randomization Condition:
    • Identifies randomness in data collection.
  4. 10% Condition:
    • Applies to small sample sizes; often ignored.
  5. Normal Population Assumption:
    • Differences follow a Normal model; validated using histogram/normal probability plot.

Hypothesis Testing Steps

  1. Hypothesis:

    • Null: ( H0: \mud = 0 )
    • Alternative: ( Ha: \mud < 0 )
  2. Test Statistic:

    • Given by ( t0 = \frac{\bar{d} - d0}{s_d / \sqrt{n}} )
    • Follows t-distribution with ( df = n - 1 )
  3. P-value:

    • Calculated from the test statistic.
  4. Decision:

    • If ( p \leq \alpha ) → Reject ( H_0 )
    • If ( p > \alpha ) → Do not reject ( H_0 )

Confidence Interval for ( \mu_d )

  • Assumptions:
    • Same as paired t-test assumptions.
  • Confidence Interval Formula:
    • ( CI = \text{point estimate} \pm ME )
    • ( ME = CV \times SE )
    • ( CI for \mud : \bar{d} \pm t^* \frac{sd}{\sqrt{n}} )
    • where ( t^* ) is the critical t-value.

Practical Example

  • Study on lactic acid levels before and after racquetball:

    • Sample size: 8 (measured before and after exercise)
    • Result to test: Mean lactate level before < mean lactate level after.
    • Calculated difference: ( d = y{before} - y{after} )
    • Found mean ( \bar{d} = -13.63, s_d = 8.28 )
  • Hypothesis:

    • Null: ( H0: \mud = 0 )
    • Alternative: ( H0: \mud < 0 )
  • Test Statistic Computation:

    • ( t_0 = \frac{-13.63 - 0}{\frac{8.28}{\sqrt{8}}} \approx -4.656 )
  • Conclusion:

    • P-value < 0.005, hence reject ( H_0 ), indicating mean lactate levels before exercise are lower.

Confidence Interval Estimation

  • 90% Confidence Interval for ( \mu_d ):
    • Calculation: ( \bar{d} \pm t^* \frac{s_d}{\sqrt{n}} )
    • Result: ( (-19.177, -8.083) )
    • Interpretation: Mean lactate before exercise is between 8.083 and 19.177 lower than after exercise.