Notes on Validity, Reliability, and Internal/External Validity in ABA (Single-Subject Context)

External validity and systemic replication

Quiz emphasis: Question three asks what best enhances external validity in single-subject research.
Answer highlighted in transcript: using systematic replication across participants and settings, especially across studies or labs, strengthens external validity.
Key idea: external validity = generalizability across people and contexts; replication across participants/settings/labs increases confidence that findings will generalize beyond a single case or setting.
Related note: more discussion to come about what this replication across contexts means in practice.

Internal validity threat in a token economy scenario (case example)

Scenario recap: A token economy is implemented for a student referred for behavior support. One week later, a classroom teacher reports that the student’s behavior is improved, saying she 100% thinks it’s improved at least a little.
Question: Identify threats to internal validity that may have inflated the researcher’s confidence that the token economy (the IV) caused the change.
Key issues raised:
- No baseline data collected: Lack of baseline data weakens confidence in change attribution (no clear start point from no treatment to treatment).
- Teacher report is subjective: Reliance on a single subjective observer measurement increases measurement bias (potential bias in data collection).
- Possible threats present: history effects or maturation could account for observed changes over time, especially without baseline and objective data.
Conclusions drawn in session:
- The absence of baseline data and reliance on subjective teacher reports are important threats to internal validity.
- History or maturation effects could be possible explanations for short-term changes even when an intervention was implemented.
Practical takeaway: Even in practice (outside pure research), beware threats to internal validity by establishing baselines and using objective, multiple measurement methods when evaluating functional changes.

Core concepts: validity, reliability, and their relationship

Define reliability: consistency or stability of a measurement across time, raters, or forms.
Define validity: the degree to which a measurement actually assesses what it is intended to assess for a given purpose.
Relationship between the two:
- Reliability is a prerequisite for validity; you cannot have valid inferences from an unreliable measure.
- Expressed informally: ext{Validity} \leq \text{Reliability}
Distinguishing internal vs external validity:
- Internal validity: how well a study demonstrates that changes in the dependent variable were caused by the independent variable, not by confounds.
- External validity: generalizability of findings to other people, settings, and times.
Additional note: reliability is necessary but not sufficient for validity; a measurement can be reliable but not valid for a given construct or purpose.
Everyday relevance: when evaluating literature, consider both how reliable the measures are and whether the measures validly capture the intended constructs for the study’s purpose.

Independent vs. dependent variables: definitions and practice

Key definitions:
- Independent variable (IV): the element the experimenter manipulates or selects to test its effect on the dependent variable.
- Dependent variable (DV): the outcome or behavior measured to assess the effect of the IV.
Takeaway: in single-subject design and in many class examples, the IV is the intervention or treatment; the DV is the behavior or outcome measured.
Foundational phrasing from the session:
- IV is what you manipulate, e.g., a treatment, intervention components, or environmental conditions.
- DV is what you measure, e.g., rate of a behavior, engagement, or correct responses.
Illustrative practice items (discrimination activity, with typical answers):
- Item 1: The effects of study cards on the rate of homework completion.
- IV: presence/absence of study cards; DV: rate of homework completion. Answer: DV.
- Reason: you’re measuring the rate of homework completion after manipulating study cards.
- Item 2: A self-management intervention increases students’ task engagement.
- IV: the self-management intervention; DV: task engagement. Answer: IV.
- Item 3: Use of an electronic signal device during classroom instruction increases the number of praise statements made by the teacher.
- IV: use of electronic signal device; DV: number of praise statements. Answer: DV.
- Item 4: The effects of a token reinforcement plus price treatment package on hand raising.
- IV: the token reinforcement plus price treatment package; DV: hand raising. Answer: IV.
- Item 5: Number of words spelled correctly following a spelling quiz review.
- The example is phrased with a measurement focus (number of words spelled correctly) after a spelling quiz review; in the session, the spelling quiz review is treated as the IV, the DV is the number of words spelled correctly. Answer: IV (with the caveat that wording can mislead and the DV would be the measured spelling words).
Practical note from the session:
- Sometimes wording like “number of X following Y” leads to confusion; listen for what is being manipulated (IV) vs what is being measured (DV).
Graph interpretation tip:
- When looking at graphs, the DV is typically plotted over time, while the IV represents the conditions/treatment changes that are hypothesized to cause DV changes. A measure of implementation integrity or treatment fidelity may also be tracked, but it’s a separate consideration from the DV itself.

Pseudoscience vs science in ABA (Normand and Horner perspectives)

Core distinction: science is evidence-based and relies on empiricism, replicable methods, and rigorous evaluation; pseudoscience relies on anecdotes, unfalsifiable claims, or dismissal of scientific standards.
Characteristics of pseudoscience discussed in the session:
- Anecdotal evidence presented as empirical support; claims not backed by systematic data.
- Unfalsifiable claims; claims that cannot be disproven.
- Dismissing scientific standards and peer-reviewed evidence; overreliance on testimonials or “cutting-edge” claims without solid data.
- Marketing or media framing that pressures readers to accept claims without rigorous methodology.
- Potential harm when professionals implement unvalidated interventions (e.g., chelation therapy, rebirthing, or similarly untested approaches).
Notable examples referenced:
- Vaccines and autism debate (historical, debunked linkage) framed as a cautionary example of how misinterpreted claims can persist.
- Facilitated communication (FC) controversy: debates over whether the client or facilitator generated the communication; emphasizes need for precise operational definitions and evaluation via rigorous methods.
- Chelation therapy and other controversial interventions associated with risk/harm despite appealing rationales.
Core scientific attitudes highlighted:
- Determinism and experimentation: behavior is governed by natural laws, tested via experiments.
- Replication: findings should be reproducible by others to be credible.
- Philosophical doubt: scientific conclusions should be held tentatively, open to revision.
- Empiricism: conclusions derived from data and observable evidence.
- Open-mindedness: be receptive to new ideas but not accept them without evidence; humorous maxim: open-minded but not so open that your brain falls out.
Practical ABA stance:
- Evidence-based practice requires multiple well-designed studies and robust experimental control; a single study is not sufficient.
- The literature should be evaluated for quality and quantity (not just the number of studies, but their rigor and replication).
Takeaway: cultivate a skeptic, evidence-based lens when evaluating interventions, and avoid conflating novel or sensational claims with proven practice.

Attitudes of science in ABA (summary of the cohort discussion)

Key attitudes emphasized:
- Determinism: behavior operates under lawful, knowable causes.
- Experimentation: using controlled manipulation to test hypotheses.
- Replication: results should be repeatable across observers, settings, and times.
- Philosophical doubt: maintain healthy skepticism about claims until supported by evidence.
- Empiricism: rely on data rather than intuition or anecdote.
- Open-mindedness: consider new evidence even if it challenges existing beliefs, while avoiding belief without evidence.
Practical implication: these attitudes undergird rigorous evaluation of evidence and help practitioners discern evidence-based practices from pseudoscience.

Threats to internal validity (Zane article and broader discussion)

Zane article focus (short, practical critique of one-group designs): highlights risks in pretest-posttest designs without control groups.
Common threats identified and discussed:
- Subject characteristics / participant selection bias: differences in participants that affect treatment sensitivity or outcomes.
- Attrition / loss of subjects: dropouts can bias results and undermine experimental control.
- Location / geographic confounds: differences in setting/classrooms can influence outcomes.
- Instrumentation: changes in the measurement system (e.g., different scales, observer drift) can alter measurements.
- Cyclical variability: predictable patterns (e.g., Mondays worse due to schedule) that create data variability tied to cycles rather than the IV.
- Adaptation / reactivity: novelty effects or observer presence changing behavior; Hawthorne effect and related phenomena.
- History: external events occurring during the study that could affect outcomes.
- Maturation: natural development or changes over time that affect outcomes independent of the IV.
- Procedural infidelity (treatment integrity): deviations from the planned intervention or protocol; e.g., changing DRO duration without adjusting the plan.
Important conceptual takeaway from Zane and the broader readings:
- In single-subject research, repeated measures and careful design help guard against threats; however, threats remain a central concern and must be anticipated and mitigated when possible.
- There is no single “perfect” design; the optimal approach depends on the research question and practical constraints.
Sketch of how these threats relate to experimental design:
- Strong experimental control (e.g., multiple baselines, alternating treatments, etc.) helps rule out rival hypotheses and confounds.
- Threats to internal validity diminish confidence in attributing observed changes to the IV, undermining the evidence-based status of a practice.

Measuring validity and reliability in practice

Recap from student discussion:
- Reliability = consistency of measurement across time/raters/situations.
- Validity = accuracy of the measurement for the intended purpose.
- A reliable measure is necessary for validity, but not sufficient by itself; the measure must also be valid for the construct and purpose at hand.
Practical questions to ask when evaluating a study:
- Was there a baseline or a stable measurement period before introducing the IV?
- Are the outcome measures appropriate for the research question and implementation fidelity?
- Are measurement procedures consistent and calibrated across time and observers?
- Are there evident confounds or extraneous variables that covary with the IV?
- Is there evidence of rival hypotheses and steps taken to disprove them?

Procedures and fidelity: guarding against threats to internal validity

Treatment integrity / fidelity: ensure the intervention is implemented as intended.
How to guard against fidelity problems:
- Clear, parsimonious operational definitions of the intervention components.
- Regular training and refresher checks for implementers.
- Fidelity checks or observers verifying that procedures are followed.
- Documentation of what was done, when, and by whom, to support replication and defend against rival hypotheses.
Data integrity: ensure the measurement system itself isn’t introducing bias (instrument drift, observer bias, etc.).
Collaboration and openness with families/medical professionals when external factors (like medication) could influence outcomes; plan to manage or hold constant potential confounds when possible.

Practical classroom activity and assignment context

The instructor walked through a practice activity breakouts to discriminate IVs and DVs using quick polls and examples; emphasis on quick, clear, correct categorization.
Breakout activities and participation points were used to reinforce the material and provide applied practice in discriminating variables.
Assignment context: an upcoming assignment (Assignment Two) focuses on formulating a strong research question; building a foundation with concepts of validity, reliability, and experimental design helps with that task.

Quick reference: core definitions and relationships (summary)

Independent variable (IV): the intervention or condition deliberately manipulated.
Dependent variable (DV): the outcome measured to assess the effect of the IV.
Validity: whether the measurement or study design measures what it intends to measure for a given purpose.
Reliability: consistency of measurement across time, observers, and measurement occasions.
Internal validity: confidence that observed changes in DV are due to IV, not confounds.
External validity: generalizability of findings to other settings, people, times.
Threats to internal validity (examples): history, maturation, instrumentation, regression, attrition, location, subject characteristics, cyclical variability, adaptation/reactivity, procedural infidelity.
Rival hypotheses: alternative explanations for data that must be considered and ruled out to strengthen inferences about the IV–DV relationship.
Evidence-based practice: conclusions supported by multiple, high-quality, reproduced studies; one study alone typically insufficient.

Notes on notation and formulas (where applicable)

Functional relation in behavior analysis can be represented as a basic model:
Y = f(X) + \varepsilon
where Y is the dependent variable (behavioral outcome), X is the independent variable (treatment/condition), and \varepsilon is error/noise.
Reliability and validity relationship (conceptual):
ext{Validity} \leq \text{Reliability}
General principle: high reliability is a prerequisite for high validity; but high reliability does not guarantee validity for the intended construct or purpose.