Single-Case Experimental Designs: Key Concepts and Standards
Single-Case Experimental Designs: A Systematic Review of Published Research and Current Standards
Abstract
Purpose of Article: The article provides an exhaustive systematic review of the research designs and methodological characteristics of single-case experimental design (SCED) research published in peer-reviewed journals between the years 2000 and 2010.
Main Argument: SCEDs represent a robust and flexible alternative to traditional group designs (RCTs) (randomized controlled trial). While group designs often focus on average treatment effects across populations, SCEDs allow for the intensive study of individual behavior change. However, they face significant methodological hurdles—such as serial dependency and lack of standardized effect size measures—which limit their broader acceptance in some scientific communities. This research seeks to establish updated benchmarks and evaluate studies against contemporary quality criteria.
Key Findings: From an initial pool, 409 studies were analyzed. Results indicate that recent SCED research increasingly aligns with modern experimental standards, although there remains a heavy reliance on visual analysis rather than complex statistical modeling.
Keywords: daily diary; single-case experimental design; systematic review; time-series; visual analysis; ecological momentary assessment (EMA).
Introduction
Historical Context: The roots of single-case experimentation are found in the foundational work of early psychologists:
Gustav Fechner (1889): Employed intra-individual comparisons in psychophysics.
John B. Watson (1925): Applied single-subject logic to behaviorism.
B.F. Skinner (1938): Developed the experimental analysis of behavior, emphasizing the individual organism's response to reinforcement schedules.
Current Usage: Despite the dominance of Large-N group designs since the mid-20th century, SCEDs are seeing a resurgence. This is partly due to the limitations of group averages in clinical settings where individual variability is paramount.
Internal Validity: Shadish, Cook, & Campbell (2002) argue that group designs minimize threats to internal validity through randomization, but SCEDs counter these threats through the logic of repeated measurement and replication of effect.
Basics of the SCED
Definition: A "single-case" can be an individual person, a single classroom, or a specific department. The defining characteristic is the use of within-subject comparisons, where the subject's own performance during a baseline phase is compared to their performance during an intervention phase.
Phase Structure:
Baseline phase (A): A period of observation intended to establish a stable pattern of behavior before any intervention is introduced.
Intervention phase (B): The period where the independent variable (IV) is manipulated.
Goal: To establish a functional relationship between the IV (e.g., a therapeutic technique) and the dependent variable (DV) (e.g., frequency of a specific behavior).
Repeated Assessments: Reliable data requires frequent, often daily, measurement to capture the trend, level, and variability of the data points.
Randomization: Kratochwill and Levin (2010) suggest that internal validity can be significantly bolstered by randomly assigning the start point of an intervention within a sequence of possible sessions.
Methodological Challenges
Serial Dependency (Autocorrelation): Data points in SCED are often highly correlated with preceding points (r_{autocorr} \neq 0). This violates the assumption of independence required for many traditional statistical tests like the t-test or ANOVA.
Baseline Stability: Establishing a stable baseline is difficult in practice; trends in the baseline (improvement or worsening without treatment) can obscure the actual effect of the intervention.
Analysis of Short Data Streams: Many clinical studies have fewer than 10-15 data points per phase, making it difficult to apply sophisticated time-series analysis like ARIMA.
Effect Size Interpretation: There is no universal consensus on how to calculate effect sizes for SCED, though Non-overlap of All Pairs (NAP) and Percentage of Non-overlapping Data (PND) are common.
Expanded Application of SCED
Ecological Momentary Assessment (EMA): Using technology (smartphones/wearables) to collect data in real-time within the subject's natural environment. This reduces recall bias and increases ecological validity.
Dynamic Processes: Modern SCED variants look at fluctuating psychological states (e.g., mood regulation, stress response) rather than just stable trait behaviors.
SCED Guidelines and Reporting Standards
What Works Clearinghouse (WWC): An initiative of the U.S. Department of Education’s Institute of Education Sciences (IES) that developed specific technical documentation for evaluating SCED evidence in education.
Standards for Evidence: To meet their highest evidence standards, a study must include at least 3 attempts to demonstrate the intervention effect and a minimum of 5 data points per phase (3 for "with reservations").
Quality Evaluation: The WWC provides a structured framework to determine if a study provides strong evidence, moderate evidence, or no evidence for an intervention's efficacy.
Tate et al. SCED Scale: a specialized critical appraisal tool developed to evaluate the methodological quality of single-case experimental designs. In fields where group-based randomized controlled trials (RCTs) are the standard, this scale provides a way to quantify the rigor of SCED research.
According to the notes, it specifically focuses on several key areas:
Inter-rater Reliability: This assesses whether data collection was consistent across different observers, which is crucial for establishing the reliability of the dependent variable (DV).
Counterbalancing: This involves checking if the researchers controlled for the order in which interventions were applied to ensure that the results aren't just a byproduct of the sequence itself.
Methodological Rigor: It serves as a checklist for researchers to ensure their work meets contemporary quality criteria, helping to address the "consensus shortfall" and the lack of a single globally mandated reporting standard for single-case studies.
By using this scale, researchers can distinguish between studies that provide strong evidence and those that may have methodological weaknesses, making it easier to include SCED findings in larger meta-analyses.
Consensus Shortfall: Unlike the CONSORT statement for group trials, SCED research still lacks a single, globally mandated reporting standard, leading to variability in how results are published.
Systematic Review Search Procedures
Search Parameters: The authors queried PsycINFO for publications between 2000 and 2010.
Selection Criteria:
Initial search: n = 571 articles identified.
Abstract screening: n = 451 articles met the basic definition of SCED.
Full-text review: n = 409 studies were included in the final synthesis.
Inclusion: Peer-reviewed journals, English language, clear manipulation of an IV.
Descriptive Statistics and Design Variants
Multiple Baseline Design (69\%): The most popular design. It involves staggering the introduction of the intervention across different subjects, behaviors, or settings to demonstrate that change only occurs when the intervention is applied.
Reversal (Withdrawal) Designs (17\%): Often labeled A-B-A-B. The intervention is introduced and then removed to see if behavior returns to baseline levels, providing strong evidence of experimental control.
Alternating/Simultaneous Treatments (6\%): Rapidly switching between different conditions to compare their relative effectiveness.
Changing Criterion Design (4\%): The treatment phase is divided into sub-phases, each with a progressively higher (or lower) requirement for reinforcement.
Analysis of SCED Data
Visual Analysis: The primary method involves looking at:
Level: The mean shift between phases.
Trend: The slope or direction of the data.
Variability: The fluctuation of data points around the mean.
Immediacy of Effect: How quickly the change occurs after the phase shift.
Statistical Modeling:
Multilevel Modeling (MLM): Used to account for the hierarchical structure of the data (observations nested within individuals).
Overlap Metrics: Calculates the percentage of data points in the intervention phase that exceed the highest point of the baseline.
Conclusions and Future Directions
Consistency: The review calls for more rigorous adherence to reporting standards to allow for better meta-analyses of SCED findings.
Technology Integration: Greater use of electronic diaries and automated data collection is recommended to improve data density.
Education: Psychology graduate programs should expand their curricula beyond group statistics to include time-series analysis and SCED methodology.
Guidelines: Future standards should transcend specific fields (like Special Education) to apply across all of clinical psychology and behavioral medicine.
Abstract
The article provides an exhaustive systematic review of the research designs and methodological characteristics of single-case experimental design (SCED) research published in peer-reviewed journals between the years 2000 and 2010. The main argument presented is that SCEDs represent a robust and flexible alternative to traditional group designs and randomized controlled trials (RCTs). While group designs often focus on average treatment effects across populations, SCEDs allow for the intensive study of individual behavior change. However, they face significant methodological hurdles, such as serial dependency and a lack of standardized effect size measures, which limit their broader acceptance in some scientific communities. This research seeks to establish updated benchmarks and evaluate studies against contemporary quality criteria. From an initial pool, 409 studies were analyzed, and results indicate that recent SCED research increasingly aligns with modern experimental standards, although there remains a heavy reliance on visual analysis rather than complex statistical modeling.
Introduction
The historical context of single-case experimentation is rooted in the foundational work of early psychologists. Gustav Fechner utilized intra-individual comparisons in psychophysics in 1889, while John B. Watson applied single-subject logic to behaviorism in 1925. B.F. Skinner further developed the experimental analysis of behavior in 1938, emphasizing the individual organism's response to reinforcement schedules. Despite the dominance of Large-N group designs since the mid-20th century, SCEDs are seeing a resurgence because group averages are often insufficient in clinical settings where individual variability is paramount. Regarding internal validity, group designs minimize threats through randomization, but SCEDs counter these threats through the logic of repeated measurement and the replication of effect.
Basics of the SCED
A single-case can be defined as an individual person, a single classroom, or a specific department, with the defining characteristic being the use of within-subject comparisons. In this format, the subject's own performance during a baseline phase is compared to their performance during an intervention phase. The phase structure typically involves a baseline phase (A), which is a period of observation intended to establish a stable pattern of behavior, followed by an intervention phase (B), where the independent variable (IV) is manipulated. The primary goal is to establish a functional relationship between the IV, such as a therapeutic technique, and the dependent variable (DV), such as the frequency of a specific behavior. Reliable data requires frequent, often daily, repeated assessments to capture the trend, level, and variability of the data points. Internal validity can also be bolstered by randomly assigning the start point of an intervention within a sequence of possible sessions.
Methodological Challenges
One of the primary methodological challenges is serial dependency, or autocorrelation, where data points in a SCED are often highly correlated with preceding points (r_{autocorr} \neq 0). This violates the assumption of independence required for many traditional statistical tests like the t-test or ANOVA. Establishing baseline stability is also difficult in practice, as trends in the baseline can obscure the actual effect of the intervention. Furthermore, the analysis of short data streams is a common issue, as many clinical studies have fewer than 10-15 data points per phase, making it difficult to apply sophisticated time-series analysis like ARIMA. Additionally, there is no universal consensus on how to calculate effect size interpretation for SCED, though Non-overlap of All Pairs (NAP) and Percentage of Non-overlapping Data (PND) are commonly utilized.
Expanded Application of SCED
The application of SCED has expanded through Ecological Momentary Assessment (EMA), which uses technology such as smartphones and wearables to collect data in real-time within the subject's natural environment. This approach reduces recall bias and increases ecological validity. Modern SCED variants also look beyond stable trait behaviors to examine dynamic processes and fluctuating psychological states, such as mood regulation and stress responses.
SCED Guidelines and Reporting Standards
Several frameworks exist to guide SCED research, including the What Works Clearinghouse (WWC), which developed specific technical documentation for evaluating SCED evidence in education. The Tate et al. SCED Scale assists in assessing the methodological quality of single-case reports by focusing on elements like inter-rater reliability and counterbalancing. Despite these tools, there remains a consensus shortfall; unlike the CONSORT statement for group trials, SCED research still lacks a single, globally mandated reporting standard, resulting in variability in published results.
Systematic Review Search Procedures
The authors conducted their systematic review by querying PsycINFO for publications between the years 2000 and 2010. The selection criteria involved an initial search that identified 571 articles, followed by an abstract screening where 451 articles met the basic definition of SCED. A full-text review resulted in 409 studies being included in the final synthesis. Inclusion was restricted to peer-reviewed journals, the English language, and studies with a clear manipulation of an IV .
Descriptive Statistics and Design Variants
The analysis of design variants showed that the Multiple Baseline Design is the most popular, appearing in 69\% of the studies. This design involves staggering the introduction of the intervention across different subjects, behaviors, or settings to demonstrate experimental control. Reversal or withdrawal designs, often labeled A-B-A-B, appeared in 17\% of the studies; these involve introducing and then removing the intervention to see if behavior returns to baseline levels. Alternating or simultaneous treatments, which involve switching between different conditions, accounted for 6\% of the research. Finally, the Changing Criterion Design, where the treatment phase is divided into sub-phases with progressively changing reinforcement requirements, was used in 4\% of the studies.
Analysis of SCED Data
Visual analysis remains the primary method for analyzing SCED data, involving the evaluation of the level (mean shift between phases), trend (slope or direction), variability (fluctuation around the mean), and immediacy of effect. Statistical modeling is also employed, including Multilevel Modeling (MLM) to account for observations nested within individuals. Overlap metrics are another statistical tool, used to calculate the percentage of data points in the intervention phase that exceed the highest point of the baseline.
Conclusions and Future Directions
The review concludes by calling for more rigorous adherence to reporting standards and better consistency to allow for meta-analyses of SCED findings. Technology integration, such as the use of electronic diaries and automated data collection, is recommended to improve data density. From an educational standpoint, psychology graduate programs should expand their curricula beyond group statistics to include time-series analysis and SCED methodology. Finally, future guidelines should transcend specific fields like Special Education to apply broadly across clinical psychology and behavioral medicine.
My quesitons:
The heavy reliance on visual analysis is considered a methodological challenge or hurdle. While it remains the primary method for evaluating level, trend, and variability, the article notes that this reliance—rather than the use of complex statistical modeling—is one of the factors that limits the broader acceptance of SCEDs in some scientific communities. The review suggests that moving toward more rigorous experimental standards and statistical models, like Multilevel Modeling (MLM), is necessary for the field to align with contemporary quality criteria.
Statistical modeling can be utilized in SCED as a more objective alternative or supplement to visual analysis through several specific methods mentioned in the research:
Multilevel Modeling (MLM): This method addresses the hierarchical structure of SCED data, where repeated observations are nested within an individual case. It allows researchers to statistically estimate both the average effect and individual variability.
Overlap Metrics: Instead of just subjective visual inspection, researchers can use quantitative indices such as Non-overlap of All Pairs (NAP) and Percentage of Non-overlapping Data (PND) to calculate the exact degree of separation between the baseline and intervention phases.
Time-Series Analysis (ARIMA): Although difficult with short data streams (typically needing more than 10-15 points), these models can be used to analyze patterns and handle serial dependency (autocorrelation), which often violates the assumptions of traditional tests like t-tests or ANOVA.
Randomization Testing: By randomly assigning the start point of an intervention within a sequence of possible sessions, researchers can bolster internal validity and apply statistical logic to determine if the change was functionally related to the intervention.
ARIMA (AutoRegressive Integrated Moving Average) is a sophisticated form of time-series analysis used to analyze patterns and trends in data collected over a sequence of time.
In the context of Single-Case Experimental Designs (SCED), ARIMA is used to address the following:
Serial Dependency (Autocorrelation): It helps account for the fact that data points in a sequence are often correlated with preceding points (r_{autocorr} \neq 0), which violates the assumptions of many traditional statistical tests like the t-test or ANOVA.
Pattern Identification: It models how current observations are related to past observations and past error terms to provide a more accurate picture of treatment effects.
However, the notes point out a major limitation in clinical research: ARIMA modeling is difficult to apply to "short data streams." Most SCED studies have fewer than 10-15 data points per phase, which is generally insufficient for the model to produce reliable or stable statistical estimates.