E

PSYCH 100A - Week 6 Day 2: Paired-Samples T-Test Notes

Week 6 Day 2: Paired-Samples T-Test

Attendance

  • Enter the number 2 in Quiz 5/7 in the "Attendance" Module.

Module 9

  • Focus on the Paired-Samples T-Test.

Outline

  1. Within-subjects designs
  2. Significance testing steps
  3. Study questions
  4. jamovi and RStudio analyses
  5. Standardized mean difference effect size
  6. Quick review

Within-Group Research Designs

  • A within-group research design yields two or more scores from the same participant.
  • A typical within-group design collects multiple repeated measurements from the same individuals (e.g., a pretest followed by a posttest).
  • The two scores can also originate from dyadic pairs of people.

Randomized Trial Application

  • Two scores from the same group of people are obtained before and after an intervention.

Experimental Application

  • The same group of people is subjected to two different experimental conditions.

Developmental Application

  • The same group of people are followed over time to examine change or development.

Dyadic Application

  • Pairs of individuals form naturally-occurring dyads (e.g., romantic partners, siblings) with linked scores.

Paired-Samples T-Test

  • The paired-samples (dependent) t-test is suited for within-group designs involving two measurements.
  • The research question and hypotheses concern the difference between two means derived from the same individuals.

Analysis Decision Tree

  • Research question: Compare groups, time points, conditions, etc.
  • Between-group:
    • Independent-samples t-test (2 means)
    • One-factor ANOVA (> 2 means)
  • Within-subjects:
    • Paired-samples t-test (2 means)
    • Repeated measures ANOVA (> 2 means)

Quick Review: Sampling Error

  • The true mean μ is an unknown parameter obtainable only by analyzing the entire population’s data.
  • In hypothesis testing, the null hypothesis specifies the value of the true population statistic.
  • Samples from the null population will have means differing from the true mean; each hypothetical sample has a different amount of sampling error.

Quick Review: Sampling Distribution

  • The distribution of estimates from many hypothetical samples represents a sampling distribution.
  • The null hypothesis provides the true μ in the distribution's center during hypothesis testing.
  • With a sufficiently large N, sample means follow a normal curve around the null hypothesis mean.

Quick Review: Standard Error

  • Standard error is the average distance from a sample mean to the null mean.
  • It reflects our expectation across many hypothetical samples.
  • Any given sample mean may be closer or farther from the true mean.

Quick Review: Normal Curve Rule

  • With large samples, we can apply the normal curve rule of thumb.
  • 95% of all hypothetical sample means fall within ± 1.96 standard errors of the true mean.
  • 5% of hypothetical sample means are outliers, falling beyond ± 1.96 standard errors.
  • μ ± 1.96 \text{ std. errors}

Significance Testing

  1. Design study and collect data
  2. Specify hypotheses about population
  3. Define standard of evidence
  4. Compare data to null hypothesis
  5. Evaluate hypotheses and draw conclusion

Skin Color Satisfaction and Binge Eating

  • Study by Parker, J.E., Enders, C.K., Mujahid, M.S., Laraia, B.A., Epel, E.S., & Tomiyama, A.J. (2022) investigated skin color satisfaction as a predictor of binge eating and its effect through body image in Black girls during adolescence.

Key Variables

  • Age: The grouping variable was age. Participants were followed longitudinally, with the dependent variable measured at ages 10 and 18.
  • Body Satisfaction: The facet of self-concept related to weight, including attitudes, evaluations, and feelings about one's own body.

Research Question

  • Do Black girls experience a change in body satisfaction during adolescence from age 10 to 18?
  • The study employed a within-subjects design, measuring satisfaction yearly in the same sample of girls.

Mean Difference Statistic

  • Population means exist at ages 10 and 18, denoted as μ{10} and μ{18}.
  • Hypotheses about change utilize a mean difference statistic that contrasts the two population means.
  • The mean difference quantifies change over time.

Two-Tailed Hypotheses

  • Null (no change): In the population, Black girls do not change their body satisfaction during adolescence. H0: μ{diff} = 0
  • Alternate (change): In the population, Black girls experience an increase or decrease in body satisfaction during adolescence. H1: μ{diff} ≠ 0

Standard of Evidence

  • Data serves as evidence to determine whether the null hypothesis is plausible or implausible.
  • If the sample mean differs significantly from the null mean, the null hypothesis is deemed implausible.
  • The question arises: how large of a difference is needed to reject the null hypothesis?

5% Significance Criterion

  • 5% outlier samples (samples with p < .05) provide evidence against the null because they are very rare.
  • Samples within the 95% region (samples with p > .05) support the null because they are fairly common.
  • μ_{diff} = 0

Analysis Summary

  • N = 882 Black girls participated in the study.
  • The sample mean difference is –3.05, with a standard error of 0.22.
  • From the standard error, we expect the mean difference statistic from many hypothetical samples to be about ± 0.22 points from the null.
  • Age 10: N = 882, X = 28.49, SD = 5.14, SE = 0.17
  • Age 18: N = 882, X = 25.44, SD = 6.06, SE = 0.20
  • Difference: N = 882, –3.05, SD = 6.44, SE = 0.22

Jamovi Output

  • Paired Samples T-Test
  • t = -14.04, df = 881.00, p < .001, Mean difference = -3.05, SE difference = 0.22, 95% CI [-3.47, -2.62], Cohen's d = -0.47
  • Descriptives:
    • BodySat18: N = 882, Mean = 25.44, Median = 26.00, SD = 6.06, SE = 0.20
    • BodySat10: N = 882, Mean = 28.49, Median = 28.00, SD = 5.14, SE = 0.17

RStudio Output

  • Descriptive statistics including mean, standard deviation, median, etc., for BodySat10 and BodySat18.

  • Paired t-test: t = -14.037, df = 881, p-value < 2.2e-16, 95% CI [-3.472435, -2.620536], mean difference = -3.046485, standard error = 0.2170266, standardized mean difference effect size: -0.4726633

Comparing Data to the Null

  • Determine whether the null hypothesis population could have produced our sample data.
  • Is the sample mean a 5% outlier, differing from the null hypothesis by more than ± 1.96 standard errors?
  • Use a t-statistic and probability value to quantify the difference between the sample mean and the null mean.

T-Statistic

  • The t-statistic quantifies the number of standard error units that separate the sample mean and null hypothesis population mean.
  • Analogous to a z-score, expressing distance in standard deviation units.

T-Statistic: Standard Errors from the Null

  • The t-statistic indicates that 14 standard error units separate the sample mean and null. A negative sign indicates scores were lower at age 18.

Probability Value

  • Fewer than 1 out of 1000 hypothetical samples are at least ± 14 standard error units from a population mean difference of 0.
  • The total probability in both directions is < .001.

Research Question Revisited

  • Studies typically aim to answer research questions involving associations between key variables.
  • Do Black girls experience a change in body satisfaction from age 10 to 18?
  • The null (no effect) hypothesis posits that body satisfaction does not change (the population mean difference is zero).

5% Significance Criterion Revisited

  • Outlier samples (p < .05) provide evidence against the null because they are very rare.
  • Samples within the 95% region (p > .05) support the null because they are fairly common.
  • The p-value for this study was < .001.

Decision Tree

  • Significant (p < .05):
    • Data differ from the null.
    • The null is shown to be “guilty.”
    • There is an effect.
  • Nonsignificant (p > .05):
    • Data are similar to the null.
    • The null is shown to be “innocent.”
    • There is no effect.

Conclusion

  • A sample mean difference of ~ 3 points is highly unusual from a no-change population.
  • Zero is not a plausible value for the true population-level mean difference.
  • Evidence suggests that Black girls’ body satisfaction changes across adolescence

Confidence Intervals Revisited

  • The 95% confidence interval gives the two most extreme values of the population mean that could have reasonably produced these data.
  • μhigh = –2.62
  • μlow = –3.47

Significance Testing with 95% Intervals

  • If the null mean is outside the 95% confidence interval, it is unlikely that a population with a mean change of 0 produced this sample.
  • The 95% confidence interval provides the same conclusion as the significance test!
  • \mu_{diff} = 0

Standardized Mean Difference

  • Statistical significance does not imply practical importance.
  • The APA recommends augmenting significance tests with standardized effect size measures.
  • The standardized mean difference or Cohen’s d effect size works for within-subjects designs.

Standardized Mean Difference Formula

  • One of several SMD formulas for within-subjects designs involves the standard deviation of a difference score variable.

Effect Size Example

  • The effect size expresses the mean difference on a standardized metric.
  • The mean difference of ~ 3 body satisfaction points equates to 0.47 standard deviation units

Cohen’s Guidelines

  • Negligible = less than | .20 |
  • Small = | .20 to .50 |
  • Moderate = | .50 to .80 |
  • Large = greater than | .80 |

Small Effect Size

  • A 0.47 standard deviation difference is a small effect size (approaching medium).

APA-Style Analysis Summary

  • A paired-samples t-test was used to examine the change in body satisfaction between ages 10 and 18.
  • There was a statistically significant decrease in body satisfaction scores, t(881) = –14.04, p < .001.
  • The mean difference was approximately three points, with a 95% confidence interval for the mean difference ranging from –3.47 to –2.62.
  • The standardized mean difference was just below Cohen’s medium effect size benchmark (d = 0.47), indicating a salient developmental change.

Study Questions

Question 1

  • A clinical psychologist aims to determine whether cognitive behavioral therapy decreases anxiety using a pretest-posttest design.
    1. Describe the setup of a within-group design to examine the impact of therapy on anxiety.
    2. Write the null hypotheses for the study in words and in symbols.

Question 2

3.  Anxiety is measured on a 10-point scale with a standard error of the mean difference is .50. Interpret this in lay terms.
4.  With a sample of 20 patients having an average difference score of -1 and a standard error of .50, what is the value of the t statistic?
5.  If the probability value for t statistic is p = .24, interpret this probability value.

Question 3

6.  Based on a probability value of .24, what is your conclusion about the effectiveness of cognitive behavioral therapy?
7.  If instead the probability value for t-statistic is .03, what is your conclusion about the effectiveness of cognitive behavioral therapy?

Question 4

8.  If the standardized mean difference effect size for the change between pretest and posttest is .37, interpret this in lay terms and describe its magnitude.
9.  If the confidence interval for the mean change between pretest and posttest ranges from -1.20 to -.80 (on the 10-point scale), interpret this in lay terms.
10. Based on the confidence interval alone, how can you determine whether the results are significant?

Jamovi Analysis

  • Demonstration of how to conduct a Paired-Samples T-Test in Jamovi using the Body Satisfaction Data.
  • Steps include selecting the Paired-Samples T-Test option, assigning variables (BodySat18 and BodySat10), and interpreting the output.

RStudio Analysis

  • RStudio code to load data, perform descriptive statistics, run a paired-samples t-test, and calculate the standardized mean difference effect size.