Paired (Dependent) Samples: Paired t-test and McNemar's Test - Key Concepts and Calculations

Paired (dependent) samples: Training vs. no training
  • Study design

    • One member randomly received training; the other did not. After training, both take a test.

    • Data are numerical scores (quantitative). Interpretation uses a matched-pairs (dependent) setup.

    • Significance level: α=0.05\alpha = 0.05 (given).

    • Alternative hypothesis is one-sided: training helps (i.e., the training group scores higher than the no-training group).

    • Therefore, the test focuses on the difference

    • Define the difference for each pair:

      • d<em>i=training</em>ino trainingid<em>i = \text{training}</em>i - \text{no training}_i, for i=1,,ni = 1, \dots, n.

    • Here, the discussion centers on 12 pairs (n=12n = 12).

  • Key conceptual shift

    • The “story” moves from two groups to one group of differences by constructing the difference column. This allows us to ignore the original two-group structure and analyze a single set of 12 differences.

    • This connects to earlier modules:

    • Module 5: confidence intervals (the one-sample perspective).

    • Module 6/7: significance testing (test statistic, p-values).

  • Hypotheses and setup

    • Null hypothesis:

    • H<em>0:μ</em>d=0.H<em>0: \mu</em>d = 0.

    • Alternative hypothesis (one-sided):

    • If training helps, then the difference should be positive on average:

    • H1: \mud > 0.

    • Population distribution assumptions (for the test statistic):

    • Numerical data (scores).

    • Dependent samples (paired differences).

    • Randomization present.

    • The population of differences should be normal or close to normal (a reasonable approximation for small samples).

  • Computation details (difference-based analysis)

    • Compute the difference column:

    • d<em>1,d</em>2,,d<em>12d<em>1, d</em>2, \dots, d<em>{12} where d</em>i=(training<em>ino training</em>i)d</em>i = (\text{training}<em>i - \text{no training}</em>i).

    • Then compute the sample mean of the differences:

    • dˉ=1n<em>i=1nd</em>i.\bar{d} = \frac{1}{n} \sum<em>{i=1}^{n} d</em>i.

    • Compute the sample standard deviation of the differences:

    • s<em>d=1n1</em>i=1n(didˉ)2.s<em>d = \sqrt{\frac{1}{n-1} \sum</em>{i=1}^{n} (d_i - \bar{d})^2}.

    • The comparison uses the paired (one-sample) t-statistic:

    • t=dˉμ<em>d0s</em>d/n,where μd0=0.t = \frac{\bar{d} - \mu<em>{d0}}{s</em>d / \sqrt{n}}, \quad \text{where } \mu_{d0} = 0.

    • Degrees of freedom:

    • df=n1=11.df = n - 1 = 11.

    • Interpretation of the t-statistic remains the same as in Module 6: larger positive tt supports H<em>1H<em>1; negative or small tt supports H</em>0H</em>0.

  • Example values from the transcript (illustrative)

    • Differences were computed (e.g., 95−90 = 5, 89−85 = 4, etc.).

    • 12 difference values yield a mean difference dˉ\bar{d} and standard deviation sds_d (values computed in class).

    • The t-statistic is reported as small in this example, leading to a large p-value.

    • Reported one-sided p-value: p=0.43.p = 0.43. (One-sided, since H1 = \mud > 0.)

    • Decision:

    • Since p = 0.43 > \alpha = 0.05, we do not reject the null hypothesis.

    • Meaning of not rejecting H0H_0:

    • It might be true that tutoring has no effect on student performance (i.e., tutoring and non-tutoring yield similar average scores).

    • Note on reporting:

    • In exams, you should write the explicit conclusion, such as: "Fail to reject H0H_0 at α=0.05\alpha = 0.05; there is no evidence that tutoring improves scores under this design."

  • Practical notes on the workflow

    • If software is available: use it to obtain the p-value for the t-statistic with df=11df = 11.

    • If not: use the t-table (t_{\alpha, df}) to determine critical values and compare with the observed tt.

    • The key idea is to work with the differences; you can think of this as transforming a two-group problem into a one-group problem.

    • Remember the distinction between one-sided and two-sided tests and reflect that in the p-value interpretation.

  • Summary takeaways

    • For matched-pairs data with a numerical outcome, a paired t-test on the difference scores is appropriate.

    • Hypotheses focus on the mean difference; null is zero difference; one-sided alternative is often used when the question specifies a direction (e.g., training improves).

    • Critical steps: compute differences, compute dˉ\bar{d} and sds_d, compute tt, determine dfdf, obtain p-value, compare to α\alpha, draw conclusion.

McNemar's test for matched categorical data (sibling puzzle experiment)
  • Context and data structure

    • Objective: determine whether there is a difference in the probability of solving a puzzle in less than one minute between older and younger siblings when data are paired.

    • Design: 70 paired siblings; each pair has two binary outcomes: time < 1 minute vs time ≥ 1 minute for older vs younger.

    • Data are categorical and paired (dependent samples).

    • Seven steps framework: categorical data, dependent samples, significance test, α\alpha given.

  • The 2x2 table used to summarize outcomes

    • Table layout (rows = older, columns = younger):

    • n11n_{11}: older < 1 min and younger < 1 min = 25

    • n12n_{12}: older < 1 min and younger ≥ 1 min = 18

    • n21n_{21}: older ≥ 1 min and younger < 1 min = 10

    • n22n_{22}: older ≥ 1 min and younger ≥ 1 min = 22

    • The two numbers that carry information about the difference between groups are the off-diagonal discordant counts:

    • n<em>12=18,n</em>21=10n<em>{12} = 18, n</em>{21} = 10.

    • The diagonal counts (n<em>11,n</em>22n<em>{11}, n</em>{22}) do not contribute to the test statistic for McNemar’s test.

  • Hypotheses

    • Null hypothesis: there is no difference in the discordant probabilities; i.e., P(\text{older < 1 & young < 1}) = P(\text{older \ge 1 & young \ge 1}). In practical terms: n<em>12n<em>{12} and n</em>21n</em>{21} are drawn from the same distribution under H0H_0.

    • Alternative hypothesis: two-sided, because either direction of difference is of interest (older may be more likely or younger may be more likely to solve quickly).

    • Therefore, two-tailed test with α\alpha given (here α=0.01\alpha = 0.01).

  • Test statistic (large-sample McNemar’s test)

    • Focus on the off-diagonal counts: n<em>12n<em>{12} and n</em>21n</em>{21}.

    • Large-sample z statistic (no continuity correction):

    • z=n<em>12n</em>21n<em>12+n</em>21.z = \frac{|n<em>{12} - n</em>{21}|}{\sqrt{n<em>{12} + n</em>{21}}}..

    • With continuity correction (optional):

    • z=n<em>12n</em>211n<em>12+n</em>21.z = \frac{|n<em>{12} - n</em>{21}| - 1}{\sqrt{n<em>{12} + n</em>{21}}}..

    • Using the given numbers: n<em>12=18,n</em>21=10n<em>{12} = 18, n</em>{21} = 10

    • Difference: 1810=8|18 - 10| = 8

    • Sum: 18+10=2818 + 10 = 28

    • Without correction: z=8281.51.z = \frac{8}{\sqrt{28}} \approx 1.51. (as stated in the transcript)

    • With correction (if applied): would be 8128=7281.32.\frac{8 - 1}{\sqrt{28}} = \frac{7}{\sqrt{28}} \approx 1.32..

    • Distribution and large-sample rule of thumb:

    • The large-sample condition is that n{12} + n{21} > 20. Here 28 > 20, so the normal approximation is acceptable.

  • P-value and decision rule

    • For a two-sided test, double the one-sided tail probability associated with z=1.51z = 1.51:

    • p-value2P(Z1.51)2(1Φ(1.51))0.13.p\text{-value} \approx 2 \cdot P(Z \ge 1.51) \approx 2 \cdot (1 - \Phi(1.51)) \approx 0.13.

    • Significance level given in the example: α=0.01\alpha = 0.01.

    • Since p\text{-value} \approx 0.13 > \alpha, we fail to reject H0H_0.

    • Conclusion: There is no evidence of a difference in the probability of solving the puzzle in less than one minute between older and younger siblings in this sample.

  • Key interpretation and nuances

    • McNemar’s test uses only the two discordant counts (n<em>12n<em>{12} and n</em>21n</em>{21}); diagonal counts do not affect the test statistic.

    • The test is specifically designed for paired dichotomous data; it assesses whether the two outcomes are equally likely across the two conditions (older vs younger in this setup).

    • Large-sample condition is an approximation; for small samples, exact McNemar test is available and may be preferred if n<em>12+n</em>21n<em>{12} + n</em>{21} is not large.

    • If the alternative had been one-sided (e.g., older more likely to be fast than younger), you would use a one-sided p-value (and not double).

  • Practical notes on the workflow

    • You need a 2x2 table with paired observations and a focus on the off-diagonal cells n<em>12n<em>{12} and n</em>21n</em>{21}.

    • Compute zz using the formula above and consult a z-table to obtain the one-sided p-value, then adjust for two-sided if applicable.

    • If n<em>12+n</em>21n<em>{12} + n</em>{21} is not large enough, consider the exact McNemar test instead of the normal approximation.

  • Connections to broader principles

    • This test illustrates how to handle paired categorical data, a common situation when the same subjects are measured under two conditions.

    • It contrasts with the paired t-test by dealing with binary outcomes rather than continuous scores.

    • The concept of using only the discordant pairs to measure a difference is analogous to focusing on information that actually reflects a change between the paired conditions.

  • Final takeaway across both examples

    • When data are paired or matched, it is often advantageous to analyze differences directly (paired t-test for numerical outcomes; McNemar’s test for binary outcomes).

    • The null hypotheses typically express no difference (e.g., zero mean difference, or equal probability of discordant outcomes).

    • The choice of one-sided vs two-sided tests depends on the research question; the transcript emphasizes explicit direction when appropriate (one-sided in the training example).

    • Always verify sample size conditions (dfdf for t-test, n<em>12+n</em>21n<em>{12} + n</em>{21} for McNemar) to decide whether to rely on asymptotic approximations or exact tests.

  • Common tools and references

    • Common test statistics discussed: t-statistic for paired observations, and z-statistic for McNemar’s test.

    • Tables discussed: z-table, t-table, and chi-square table.

    • In practice, software can provide exact p-values for McNemar and p-values for the paired t-test; if not, use the corresponding tables and the described formulas.

For paired (dependent) samples, statistical tests analyze differences directly:

Paired t-test (for numerical data)

  • Study Design: Compares two conditions on the same subjects (e.g., training vs. no training). Data are numerical scores.

  • Key Concept: Transforms the two-group problem into a one-sample problem by computing differences: d<em>i=training</em>ino trainingi.d<em>i = \text{training}</em>i - \text{no training}_i.

  • Hypotheses: Null hypothesis: H<em>0:μ</em>d=0H<em>0: \mu</em>d = 0 (no mean difference). Alternative hypothesis: H1: \mud > 0 (e.g., training helps) or H<em>1:μ</em>d0H<em>1: \mu</em>d \neq 0.

  • Test Statistic: t=dˉμ<em>d0s</em>d/nt = \frac{\bar{d} - \mu<em>{d0}}{s</em>d / \sqrt{n}} with df=n1.df = n - 1.

  • Decision: Compare the p-value to the significance level α\alpha. If p > \alpha, fail to reject H0H_0. (e.g., p = 0.43 > \alpha = 0.05, no evidence training improves scores).

McNemar's test (for matched categorical data)

  • Study Design: Compares paired binary outcomes (e.g., probability of solving a puzzle for older vs. younger siblings). Data are categorical.

  • Data Structure: Summarized in a 2x2 table. Only off-diagonal discordant counts (n<em>12n<em>{12} and n</em>21n</em>{21}) are used, representing changes between conditions.

  • Hypotheses: Null hypothesis: no difference in discordant probabilities (n<em>12n<em>{12} and n</em>21n</em>{21} are drawn from the same distribution). Alternative hypothesis: two-sided, indicating a difference.

  • Test Statistic: Large-sample z-statistic: z=n<em>12n</em>21n<em>12+n</em>21.z = \frac{|n<em>{12} - n</em>{21}|}{\sqrt{n<em>{12} + n</em>{21}}}. The large-sample condition is n{12} + n{21} > 20.

  • Decision: For a two-sided test, double the one-sided p-value. If p > \alpha, fail to reject H0H_0. (e.g., z1.51z \approx 1.51, p \approx 0.13 > \alpha = 0.01, no evidence of difference in puzzle-solving probability).

General Takeaways

  • Analyzing differences directly is advantageous for paired data to assess changes or effects within subjects.

  • Null hypotheses typically state no difference (zero mean difference or equal discordant probabilities).

  • The choice between one-sided and two-sided tests depends on the research question's directionality.

  • Always verify sample size conditions (degrees of freedom for t-test, sum of discordant counts for McNemar's) to ensure the validity of the chosen approximation or test.