One-way Repeated-Measure ANOVA & ANOVA Revision Notes
One-way Repeated-Measures ANOVA Secondary Analysis
Secondary Analysis
- Secondary analysis is similar to one-way between-subjects ANOVA.
- It is conducted after a significant F omnibus test.
- The F omnibus test indicates if there is any difference among the conditions.
- Secondary analysis identifies how the conditions differ.
- Comparisons or contrasts between conditions can be planned (a priori) or post hoc.
- Comparisons can be simple or complex.
- Multiple comparisons increase the family-wise error rate.
- Methods to control family-wise error rate include Bonferroni, Tukey, etc.
Secondary Analysis Diagram
- Omnibus F-test
- If NO (not significant), stop.
- If YES (significant), proceed to:
- Types of comparisons:
- A priori (planned) contrasts: use
contrast
command (F & t statistics). Could be simple or complex. - Post hoc comparisons: use
pwmean
command (t statistics). Simple.
- Error rate adjustment method:
- Bonferroni & Scheffé
- Tukey
- No adjustments
A Simple Example
- Variables:
- One numerical DV: number of injuries in a three-month period
- One categorical IV: 3 types of costumes (Mickey, Superman, Batman)
- Design: every child wears all three costume types for 3 months each
- Research question: Does wearing different costumes lead to different injuries?
- Research Hypothesis: Children will sustain more injuries when wearing superhero costumes (Superman and Batman) compared to a non-superhero costume (Mickey).
- Statistical Analysis: One-way repeated-measures ANOVA
- Dataset:
cos_rep_example.dta
- var1:
cos
- var2:
injury
(numerical, 1-25) - var3:
id
- Note the difference in wording compared to between-subjects designs.
Omnibus F-test Results
- A one-way repeated measures ANOVA was conducted to examine how different costumes may lead to varying levels of injuries over a three-month period.
- Results: Costume type elicited statistically significant differences in the mean frequency of injuries, F(2, 18) = 4.38, p = 0.028, \eta_p^2 = .33.
Follow-up Analysis I: CONTRAST OPTION 1
- Research Hypothesis: Children will sustain more injuries during the periods in which they wear superhero costumes (Superman and Batman) compared to when they wear a non-superhero character costume (Mickey)
contrast {cos -1 .5 .5}, effects mcompare(bon)
(Bonferroni option ignored since there is only one comparison)- More injuries in Superhero than Mickey, t(18) = 2.89, p = .01, statistical significance
Follow-up Analysis I Contd.: CONTRAST OPTION 2
- Research Hypothesis: Children will sustain more injuries during the periods in which they wear superhero costumes – Superman or Batman – compared to when they wear a non-superhero character costume (Mickey)
contrast cos, effects mcompare(bon)
- Denominator df = 18
- Joint F = 4.38, p = 0.0282
- Batman vs Mickey: t(18) = 2.83, p = 0.022 (Bonferroni adjusted p-value = 0.022)
- Superman vs Mickey: t(18) = 2.17, p = 0.088 (Bonferroni adjusted p-value = 0.088)
- Superman vs. Mickey: t(18) = 2.17, p = .088; although Superman led to numerically higher injuries, the comparison is not statistically significant.
- Batman vs. Mickey: t(18) = 2.83, p = .022; wearing the Batman costume resulted in significantly higher injuries than wearing the Mickey costume.
Follow-up Analysis I Contd.: OTHER CONTRASTS
- Description of suffixes we have learned!
- differences from the reference (base) level; the default
- differences from the next level (adjacent contrasts)
- differences from the previous level (reverse adjacent contrasts)
Follow-up Analysis II
- Exploratory analysis: We wonder how wearing different costumes may lead to different levels of injuries.
pwmean injury, over(cos) effects mcompare(tukey)
- Pairwise comparisons of means with equal variances
- Superman vs. Mickey: t(18) = 2.03, p = .123; although Superman led to numerically higher injuries, the comparison is not statistically significant;
- Batman vs. Mickey: t(18) = 2.65, p = .034; wearing the Batman costume resulted in significantly higher injuries than wearing the Mickey costume;
- Batman vs. Superman: t(18) = 0.62, p = .811; no statistically significant difference, although Mean(Batman) > Mean(Superman).
Effect Size
- Still use Cohen’s d for the standardized effect size of the multiple comparisons
- Can use: Cohen’s d = (x1 - x2) / SD_{pooled}
- Can also use: Cohen’s d = (x1 - x2) / \sqrt{MS_{residual}}
- Using the previous example; compare the mean of Superman and Batman conditions combined vs. the group mean of Mickey:
- Cohen’s d = 5.3 / \sqrt{22.496296} ≈ 1.12
- Cohen’s (1988) effect size rule-of-thumb:
- 0.2 (small effect)
- 0.5 (medium effect)
- 0.8 (large effect)
Writeup: Putting Everything Together - EXAMPLE 1
- Research Hypothesis: Children will sustain more injuries during the periods in which they wear superhero costumes (Superman and Batman) compared to when they wear a non-superhero character costume (Mickey)
- A one-way repeated measures ANOVA was conducted to examine how wearing different costumes may lead to varying levels of injuries over a three-month period. The results showed that costume type elicited statistically significant and large differences in the mean frequency of injuries, F(2, 18) = 4.38, p = 0.028, \eta_p^2 = .33.
- Further comparison showed that wearing superhero costumes (Superman and Batman) led to a statistically significantly higher number of injuries (Mean = 13.2) than wearing Mickey costumes (Mean = 7.9), t(18) = 2.89, p = .01, 95% CI [1.4, 9.2], with a large effect size d = 1.12. This result supported our research hypothesis.
Writeup: Putting Everything Together - EXAMPLE 2
- Research Hypothesis: Children will sustain more injuries during the periods in which they wear superhero costumes (Superman and Batman) compared to when they wear a non-superhero character costume (Mickey)
- A one-way repeated measures ANOVA was conducted to examine how wearing different costumes may lead to varying levels of injuries over a three-month period. The results showed that costume type elicited statistically significant and large differences in the mean frequency of injuries, F(2, 18) = 4.38, p = 0.028, \eta_p^2 = .33.
- A post hoc comparison using Tukey’s adjustment method was conducted to explore how conditions differed. There was only a large-sized (d = 1.26) statistically significant difference between Batman and Mickey costumes, with the former (Mean = 13.9, SD = 4.9) leading to a higher number of injuries than the latter (Mean = 7.9, SD = 3.7), t(18) = 2.65, p = .034, 95% CI [0.4, 11.6]. However, wearing Superman (Mean = 12.5, SD = 6.3) or Mickey costumes did not differently impact the number of injuries children sustained, t(18) = 2.03, p = .123, 95% CI [-1.0, 10.2], although with a large effect size d = 0.97; nor was there a difference between the two superhero costume conditions, t(18) = 0.62, p = .811, 95% CI [-4.2, 7.1], with a small-sized effect, d = 0.30.
PSYU/X2248 Lecture Learning Outcomes
- After this week’s lecture, you know:
- How to conduct omnibus repeated-measures one-way ANOVA F-test
- How to interpret its output
- How to test the assumptions of repeated-measures one-way ANOVA analysis
- How to conduct multiple comparisons while controlling for the family-wise error rate after obtaining a significant F-test result
- Know the differences between various types of multiple comparison methods and how to choose one
- How to report the effect size for the omnibus F-test and multiple comparisons
- In Stata, you should be able to:
- Open data files
- Test assumptions of one-way repeated-measures ANOVA
- Run the repeated-measures one-way ANOVA analysis
- Follow up with the desired multiple comparisons of means and control for the family-wise error rate
- Create and save a .do file for your commands (syntax)
ANOVA Revision
Note: We are using real research data as examples.
Caveats
- However, the assumptions are not necessarily met.
- In the original papers, other types of statistical analyses and/or ways to address the potential issues with the assumptions were applied.
- What do we do when (some) assumptions are not met?
- Alternative analyses (e.g., Week 12, non-parametric analyses)
- Data management techniques (e.g., some in Design & Statistics III)
- Removing outliers
- Bootstrapping
- Data transformation (e.g., using log-transformed DV or standardized DV)
Research Design Example I
- Pain is a protective perceptual response shaped by contextual, psychological, and sensory inputs that suggest danger to the body.
- Method: A within-subjects, randomized, double-blinded, repeated-measures design was used.
- Virtual rotation was:
- 20% less than actual physical rotation (rotation gain = 0.8)
- equal to actual physical rotation (rotation gain = 1)
- 20% greater than actual physical rotation (rotation gain = 1.2)
- The order of the three conditions was counterbalanced across participants.
- To test our main hypothesis (i.e., that visual information that overstates or understates true rotation can affect movement-evoked pain), we compared pain-free range of motion (measured by degrees) across the three conditions. We used repeated measures analysis of variance (ANOVA) with Bonferroni-corrected pairwise comparisons.
Results
- Primary outcome: pain-free range of motion
- The repeated measures ANOVA revealed a large overall effect of visual-proprioceptive feedback (condition) on pain-free range of motion F(2, 94) = 18.9, p < .001, \eta_p^2 = 0.29.
- All pairwise comparisons were significant (ps < .01).
- Vision understated true rotation:
- pain-free range of motion was increased
- medium-sized effect, p = .006, d = 0.67
- Vision overstated true rotation:
- pain-free range of motion was decreased
- large effect, p = .001, d = 0.80
- Specifically, during visual feedback that understated true rotation, pain-free range of motion was increased by 6% (95% confidence interval, or CI = [2%, 11%]); during visual feedback that overstated true rotation, pain-free range of motion decreased by 7% (95% CI = [3%, 11%]).
- Therefore, our results show an overall effect of the manipulation of 13%.
Further Improvement on the Writing?
- t(df) also needs to be reported
- 3 comparisons were done, but only 2 were reported
Research Design Example II
- Overall hypothesis of the study: We hypothesize that those in need of help underestimate the strength of others’ prosocial motivation to help when asked directly, consequently underestimating how willingly others will help and how positively others will feel about helping.
- Design and procedure. As an initial test of our hypothesis, we adapted a commonly used scenario from prior research in which one person asks another to borrow a cell phone.
- We recruited visitors at a public park and randomly assigned them to imagine either asking to borrow a cell phone from a stranger at that location (requester condition) or being asked the same request by a stranger (helper condition).
- In addition, we introduced an exploratory manipulation of gratitude expression (yes, no) to examine how explicit appreciation might affect participants’ expectations.
- This yielded a 2 (perspective: requester vs. helper) × 2 (gratitude: mentioned vs. not mentioned) between-participants design…
Experiment 1a: Can I Use Your Phone?
- Participants received a tablet to read the study scenario and provided their responses in private. This scenario included two stages: the requester first making a request, and the helper then fulfilling the request.
- In the first stage, participants in the requester condition imagined that they were in need of a cell phone to handle an emergency and approached a stranger nearby and asked to borrow their phone, whereas participants in the helper condition imagined being approached by a stranger with the same request.
- After reading the request, participants reported their expectations—written from the perspective of either a requester or a potential helper— about how willing, and also how likely, the potential helper was to help on scales ranging from 0 (not at all) to 10 (extremely). Participants then answered four questions adapted from Flynn and Lake (2008), one asking participants to predict the percentage of people who would agree to this request (0% – 100%) and three measuring the discomfort of declining a request (how difficult, awkward, or embarrassing it would be for the helper to say “no”; α = .82) on scales ranging from 0 to 10.
- Several different DVs that the authors investigated to look at the difference between the Helper's vs. Requester’s perspectives (but not our focus for the demo ;-))
Experiment 1a
- In the second stage, participants imagined that the helper agreed to the request and offered help.
- Participants in the gratitude condition further imagined that the requester explicitly thanked the helper, whereas those in the no-gratitude condition did not receive this additional information.
- Participants then indicated how positive or negative, pleased, inconvenienced, and annoyed they expected the helper (either oneself or another person, depending on perspective conditions) to feel after the interaction…
- … [these DVs were measured] using scales ranging from 0 (not at all) to 10 (extremely), except that the positive/negative item included a scale of −5 (much more negative than normal) to 5 (much more positive than normal), with 0 (no different than normal) as the midpoint, which we transformed from 0 to 10 prior to data analysis.
- Additional DVs that the authors investigated, taking both factors (the assigned Perspective and Gratitude) into account – these DVs are what we are interested in for the demo!
Results Report
- What can be further improved for this report?
- reporting \eta_p^2 for the interaction, too (despite being non-significant)
- commenting on the magnitude of the effect size
Results Write Up
- What can be further improved for this report?
- commenting on the magnitude of the effect size
- reporting the stats for the non- significant main effect, and interaction
- this is a complex study, though (multiple experiments with each experiment containing many different DVs)
Experiment 1b: Imagined Requests
- The overall hypothesis of the study: We hypothesize that those in need of help underestimate the strength of others’ prosocial motivation to help when asked directly, consequently underestimating how willingly others will help and how positively others will feel about helping.
- In order to examine whether the patterns of results obtained in Experiment 1a were robust across different helping scenarios, we conducted Experiment 1b, in which participants were randomly assigned to imagine either asking for help or being asked for help in one of six everyday scenarios.
Experiment 1b: Imagined Requests
- Design and procedure: After providing their informed consent, participants read one of six scenarios from either the perspective of a requester or a helper in which the requester either mentioned being grateful or not, yielding a 2 (perspective: requester vs. helper) × 2 (gratitude: mentioned vs. not mentioned) × 6 (scenarios) between-participants design.
- These scenarios depicted requests of different sizes using gender-neutral language, including borrowing a stranger’s cell phone (cell phone scenario; same as Experiment 1a), giving away a subway seat (subway scenario), escorting someone to a specific destination (directions scenario), carrying boxes down a few flights of stairs (carrying-boxes scenario), demonstrating how to use a library kiosk (library-kiosk scenario), and giving away change at a food truck (food-truck scenario).
- As in Experiment 1a, each scenario again included two stages: the requester first making a request, and the helper then fulfilling the request.
- Participants’ expectations about how willing and likely the potential helper was to help, the predicted percentage of people who would agree to this request (0%—100%) (measured at the first stage), and how positive/negative, pleased, inconvenienced, and annoyed they expected the helper to feel after the interaction, etc. were measured (measured at the second stage).
- For the following demo, we will focus on 2 (perspective: requester vs. helper) × 6 (scenarios) between-participant design on participants’ expectation of people’s willingness to help (DV1) and on the estimated percentage of agreement (DV2).
DV: Willingness to Help
- Anything unusual about this results report?
- Reported F-value for simple effect analysis
- What can be further improved for this report?
- Detailed simple effect stats of perspective (or role) for each scenario
- Commenting on the magnitude of the effect size
- Notice the effect size for the simple effect?
- Difference between the means vs. Cohen’s d?
Effect Size
- DIFFERENCE BETWEEN THE MEANS VS. COHEN’S D
- Cohen’s d is extracted from the independent t-tests, which focus on examining the differences between the two perspectives: Helper vs. Requester, across different experiments (and hence, different samples, also different experimental settings).
DV: Estimated Percentage of Agreement
- Estimated percentage of agreement. A 2 × 6 factorial ANOVA on the estimated percentage of people who would agree to the request indicated a significant main effect of perspective, F(1, 1192) = 15.52, p < .001, \etap^2 = .013, a significant main effect of scenario, F(5, 1192) = 51.95, p < .001, \etap^2 = .18, and a nonsignificant interaction between perspective and scenario, F(5, 1192) = 1.56, p = .17, \eta_p^2 = .006. Overall, participants who imagined asking for help expected fewer people to agree than those who imagined being asked for help (Ms = 55.2% vs. 60.0%; SDs = 24.6% vs. 22.2%; see Fig. 3 for results from each scenario). This overall result is consistent with the perspective gap reported by Flynn and Lake (2008), although the cell-phone scenario again yielded a nonsignificant perspective gap (Ms = 50.7% vs. 50.1%, SDs = 23.0% and 22.3%), t(198) = 0.19, p = .85, d = 0.03, consistent with the null effect in a similar scenario observed in Experiment 1a.
DV: Estimated Percentage of Agreement
- Assuming, in addition to the main hypothesis, the researchers were also interested in how the scenarios differ, more specifically, how borrowing a phone differs from the rest …
- Estimated percentage of agreement. A 2 × 6 factorial ANOVA on the estimated percentage of people who would agree to the request indicated a significant main effect of perspective, F(1, 1192) = 15.52, p < .001, \etap^2 = .013, a significant main effect of scenario, F(5, 1192) = 51.95, p < .001, \etap^2 = .18, and a nonsignificant interaction between perspective and scenario, F(5, 1192) = 1.56, p = .17, \eta_p^2 = .006. Overall, participants who imagined asking for help expected fewer people to agree than those who imagined being asked for help (Ms = 55.2% vs. 60.0%; SDs = 24.6% vs. 22.2%; see Fig. 3 for results from each scenario). Extending Experiment 1a, as shown in Table X and Figure Y, the cell phone scenario had an overall lower estimated percentage of people who would agree to the request than scenarios carrying boxes, food truck, or library kiosk, t(1192) = 6.05, p < .001, medium effect, d = 0.61; t(1192) = 6.61, p < .001, medium effect, d = 0.66; and t(1192) = 10.48, p < .001, large effect, d = 1.05, respectively. The cell phone scenario, however, had a higher estimated percentage of agreeing to requests than asking directions, t(1192) = -2.96, p = .016, small effect, d = 0.30. The two scenarios – borrowing a phone and subway seats – did not significantly differ in the estimated percentage of people agreeing to help (t(1192) = 0.16, p = 1.000). This overall result is consistent with the perspective gap reported by Flynn and Lake (2008).
Research Design Example III
- RQ/Hypothesis:
- Can the reconsolidation of a reactivated visual memory of experimental trauma be disrupted by engaging in a visuospatial task that would compete for visual working memory resources?
- Reconsolidation theory (Nader & Hardt, 2009) predicts that old memories are susceptible to disruption only when reactivated and only disrupted if an intervention prevents restabilization (reconsolidation). If playing Tetris after memory reactivation reduces intrusion frequency by interfering with memory reconsolidation, then only the combination of film-memory reactivation (to initiate reconsolidation) with Tetris game play (to interfere with reconsolidating visual memory for the trauma film) should be effective. Playing Tetris alone parallels nonreactivation controls in the reconsolidation literature. Memory reactivation alone should also be insufficient to alter intrusion frequency (Nader & Hardt, 2009). Both components control for nonspecific task effects.
- Design: Experimenters randomly assigned participants to ONE of the four conditions.
- Sample: N = 72, n = 18 per condition
- Procedure:
- The onset of the experiment: All participants watched a trauma film
- Recorded the number of intrusive memories they experienced over the next 24 hours (Day 0)
- Returned to the lab and completed the experimental task
- Recorded the number of intrusive memories they experienced over the next 7 days (Days 1-7)
IV & DV
- IV: Experimental task (4 groups / levels)
- No-task control: These participants completed a 10-minute music filler task.
- Reactivation + Tetris: These participants were shown a series of images from the trauma film to reactivate the traumatic memories (i.e., reactivation task). After a 10-minute music filler task, participants played the video game Tetris for 12 minutes.
- Tetris Only: These participants played Tetris for 12 minutes but did not complete the reactivation task.
- Reactivation Only: These participants completed the reactivation task but did not play Tetris.
- DV:
- number of memory intrusions before the experimental tasks (DayZeroNumberofIntrusions)
- number of memory intrusions after the experimental tasks (DayOnetoSevenNumberofIntrusions)
- Predictions:
- Before the experimental tasks (day 0), there shouldn’t be differences in trauma memory intrusions between the participant groups.
- IV = tasks, DV = day 0 trauma memory intrusions
- After the experimental tasks (days 1-7), there should be differences among participant groups; specifically, only participants who complete a visuospatial task after reactivation of the traumatic memories (i.e., Reactivation + Tetris group) would experience a reduction in intrusive memories; Reactivation or Tetris only group would be equivalent to no-task control.
- IV = tasks, DV = day 1-7 trauma memory intrusions
Intrusive memories postintervention
- First, prior to the intervention (over the first 24 hr after viewing the film: Day 0), we confirmed that the four groups experienced a similar number of intrusive memories, F(3, 68) = 0.16, p = .92
- Second, and critically, for the 7-day diary postintervention, there was a significant difference between groups in overall intrusion frequency in daily life, F(3, 68) = 3.80, p = .01, \eta_p^2 = .14
- Planned comparisons demonstrated that relative to the no-task control group, only those in the reactivation-plus-Tetris group, t(22.63) = 2.99, p = .007, d = 1.00, experienced significantly fewer intrusive memories; this finding replicated Experiment 1. Critically, as predicted by reconsolidation theory, the reactivation-plus-Tetris group had significantly fewer intrusive memories than the Tetris-only group, t(27.96) = 2.52, p = .02, d = 0.84, as well as the reactivation-only group, t(25.68) = 3.32, p = .003, d = 1.11. Further, there were no significant differences between the no-task control group and the reactivation-only group, t(32.23) = 0.22, p = .83, or between the no- task control group and the Tetris-only group, t(30.03) = 1.01, p = .32.