Experimental and Quasi-Experimental Research Methodology Notes

Foundations of Experimental and Quasi-Experimental Research

Research designs are primarily categorized based on three core characteristics: manipulation, control/assignment, and measures over time. A true experimental design requires the manipulation of an independent variable (VI) and the rigorous control of participant assignment (randomization). If there is no manipulation, the study is classified as correlational. If manipulation exists but random assignment is lacking, it is a quasi-experiment. Temporal dimensions further distinguish designs: cross-sectional designs involve a single measurement point, pre/post designs involve two measurements, and longitudinal designs involve three or more measurements over time.

Primary Dimensions of Research Structure

When constructing a design, researchers must evaluate several factors. The number of independent groups can range from a single group to two or more. The assignment type determines whether it is a true experiment (random assignment) or a quasi-experiment (non-random). The structure is defined as between-subjects (different participants for each condition), within-subjects (the same participants for all conditions), or mixed (combining both). Temporal aspects vary from one measure to pre-post assessments or full series (simple vs. repeated). The number of independent variables defines the design as simple ( $1 \, \text{VI}$ ) or factorial ( $2+ \, \text{VIs}$ ). Finally, the number of dependent variables (VD) determines if the analysis is monovariate (single VD) or multivariate (multiple VDs), where the relationships between VDs are either considered or ignored.

Comparison of Between-Subjects and Within-Subjects Designs

Choosing between these structures involves several trade-offs. In between-subjects designs, different participants are used for each condition, answering the question "Do groups differ from each other?" This design is necessary for irreversible treatments and offers better masking of research hypotheses (low demand characteristics), but requires a large sample ( $N$ ) and has lower statistical power. In within-subjects designs, each participant serves as their own control, responding to the question "How does the individual change?" This offers high statistical power and requires a smaller sample but is highly susceptible to learning/practice effects and demand characteristics.

Statistical power needs differ significantly: for an effect size of $0.2$ , a between-subjects design requires $N = 1084$ , while a within-subjects design requires only $N = 272$ . For an effect size of $0.5$ , the requirements are $N = 176$ and $N = 45$ , respectively. This represents a ratio of approximately $4:1$ in terms of required participants.

Statistical Analysis and Data Interpretation

Analyzing results requires indicators of central tendency (Mean, Median, Mode) and dispersion (Standard Deviation, Variance, Quartiles, Deciles, Percentiles, Interquartile Range, and Range). Confidence intervals (IC) provide a range representing a specific probability (e.g., $95\%$ ) that the true population mean is included. These intervals depend on the sample size ( $n$ ) and the sample standard deviation ( $SD$ ). The error of the estimate of the population mean, or standard error ( $ES$ ), is calculated as follows:

$ES = \frac{SD}{\sqrt{n}}$

If the sample is larger and the $SD$ is smaller, the confidence interval becomes narrower. Statistical results are often reported using values like F(1,58) = 16.69, p < .001 or t(58) = -4.085, p < 0.001. For multiple group comparisons, a mean score table across groups A, B, and C might show results such as $A-B$ ( $\text{estimate} = -3.94, SE = 2.64, p = 0.30$ ) or $B-C$ ( $\text{estimate} = -8.08, SE = 2.64, p = 0.01$ ).

Within-Subjects Sequential and Pre-Post Designs

Sequential within-subjects designs involve a single group receiving multiple treatments in a specific order (e.g., Treatment 1 then Treatment 2). A researcher might compare study techniques, such as written summaries (Technique A) vs. conceptual maps (Technique B), by testing participants after Each. However, a major flaw in this design is the order effect: the second treatment is never neutral because it occurs in a psychological context already modified by the first treatment. Solutions include simple counterbalancing (AB/BA splits), complete counterbalancing ( $ABC, ACB, BCA, BAC, CAB, CBA$ ), or random ordering.

Pre-post designs ( $O_1 \rightarrow X \rightarrow O_2$ ) evaluate if a change occurred over time after an intervention ( $X$ ). While they document change, they do not explain why it occurred. Longitudinal designs differ because time itself is the key factor, whereas in within-subjects designs, time is merely the space in which a manipulation happens. Potential improvements for these designs include increasing observations ( $O_1 \rightarrow O_2 \rightarrow X \rightarrow O_3 \rightarrow O_4$ ), using multiple measures for the same VD, or establishing a stable baseline before intervention. One risk in pre-post designs is regression toward the mean, where extreme initial scores naturally move toward the average in subsequent measurements regardless of intervention.

Factorial Designs and Interactions

Factorial designs involve two or more independent variables, allowing researchers to study the effect of each variable (main effects) and how they combine (interaction or moderation). An interaction occurs when the effect of one VI depends on the level of another VI. For example, a study might look at whether mindfulness improves academic performance differently for students with high vs. low exam anxiety. If mindfulness helps only high-anxiety students, an interaction is present.

Interactions can be crossover (where the direction of the effect reverses) or non-crossover (where the magnitude of the effect changes but the direction stays the same). Furthermore, researchers distinguish between experimental interaction (all VIs are manipulated) and statistical interaction (not all VIs are manipulated). An example of a factorial design would be evaluating fear responses based on environment (Virtual Reality vs. Video Screen) and threat type (Physical vs. Social). This creates a $2 \times 2$ matrix with four conditions (VR/Physical, VR/Social, Screen/Physical, Screen/Social), allowing for the testing of multiple effects.

Practical Applications and Case Studies

Case Study 1: Mental Well-being in Schools (Tornivuori et al.). This single-group pretest-posttest study evaluated a 6-week intervention for adolescents ( $n=87$ ). Results for the primary outcome (YP-CORE) showed a significant mean score decrease of -3.82, p < .001, d = .627. Longer-term follow-up at 6 months ( $n=68$ ) showed a smaller, non-significant decrease of $-1.14$ .

Case Study 2: EMDR vs. Posner Paradigm. This randomized controlled trial of 50 participants compared Conventional Eye Movement Desensitization and Reprocessing (EMDR) with the Posner paradigm (spatial attention shifts without eye movements). Both groups showed significant reductions in Subjective Units of Distress (SUDs), suggesting the therapeutic mechanism involves attention shifting rather than eye movement specifically.

Questions and Discussion

During sessions, participants were asked: "In your opinion, if we had done the opposite—first B and then A—what would have happened?" This highlighted the issue of the sequential within-subjects design where the second treatment is not neutral. Another exercise asked students to evaluate a study where Technique A (breathing) was followed by Technique B (visualization). The result claimed B was more effective, but the critique identified that the design could not distinguish the treatment effect from the cumulative effect of being in the study or the specific order of techniques. Additional design variations discussed included randomized block designs (homogenizing groups by traits like age before random assignment) and waitlist control designs (where one group receives treatment immediately and the other later, often for ethical reasons in clinical settings).