Notes on Validity, Reliability, and Internal/External Validity in ABA (Single-Subject Context)

External validity and systemic replication

  • Quiz emphasis: Question three asks what best enhances external validity in single-subject research.
  • Answer highlighted in transcript: using systematic replication across participants and settings, especially across studies or labs, strengthens external validity.
  • Key idea: external validity = generalizability across people and contexts; replication across participants/settings/labs increases confidence that findings will generalize beyond a single case or setting.
  • Related note: more discussion to come about what this replication across contexts means in practice.

Internal validity threat in a token economy scenario (case example)

  • Scenario recap: A token economy is implemented for a student referred for behavior support. One week later, a classroom teacher reports that the student’s behavior is improved, saying she 100% thinks it’s improved at least a little.
  • Question: Identify threats to internal validity that may have inflated the researcher’s confidence that the token economy (the IV) caused the change.
  • Key issues raised:
    • No baseline data collected: Lack of baseline data weakens confidence in change attribution (no clear start point from no treatment to treatment).
    • Teacher report is subjective: Reliance on a single subjective observer measurement increases measurement bias (potential bias in data collection).
    • Possible threats present: history effects or maturation could account for observed changes over time, especially without baseline and objective data.
  • Conclusions drawn in session:
    • The absence of baseline data and reliance on subjective teacher reports are important threats to internal validity.
    • History or maturation effects could be possible explanations for short-term changes even when an intervention was implemented.
  • Practical takeaway: Even in practice (outside pure research), beware threats to internal validity by establishing baselines and using objective, multiple measurement methods when evaluating functional changes.

Core concepts: validity, reliability, and their relationship

  • Define reliability: consistency or stability of a measurement across time, raters, or forms.
  • Define validity: the degree to which a measurement actually assesses what it is intended to assess for a given purpose.
  • Relationship between the two:
    • Reliability is a prerequisite for validity; you cannot have valid inferences from an unreliable measure.
    • Expressed informally: ext{Validity} \leq \text{Reliability}
  • Distinguishing internal vs external validity:
    • Internal validity: how well a study demonstrates that changes in the dependent variable were caused by the independent variable, not by confounds.
    • External validity: generalizability of findings to other people, settings, and times.
  • Additional note: reliability is necessary but not sufficient for validity; a measurement can be reliable but not valid for a given construct or purpose.
  • Everyday relevance: when evaluating literature, consider both how reliable the measures are and whether the measures validly capture the intended constructs for the study’s purpose.

Independent vs. dependent variables: definitions and practice

  • Key definitions:
    • Independent variable (IV): the element the experimenter manipulates or selects to test its effect on the dependent variable.
    • Dependent variable (DV): the outcome or behavior measured to assess the effect of the IV.
  • Takeaway: in single-subject design and in many class examples, the IV is the intervention or treatment; the DV is the behavior or outcome measured.
  • Foundational phrasing from the session:
    • IV is what you manipulate, e.g., a treatment, intervention components, or environmental conditions.
    • DV is what you measure, e.g., rate of a behavior, engagement, or correct responses.
  • Illustrative practice items (discrimination activity, with typical answers):
    • Item 1: The effects of study cards on the rate of homework completion.
    • IV: presence/absence of study cards; DV: rate of homework completion. Answer: DV.
    • Reason: you’re measuring the rate of homework completion after manipulating study cards.
    • Item 2: A self-management intervention increases students’ task engagement.
    • IV: the self-management intervention; DV: task engagement. Answer: IV.
    • Item 3: Use of an electronic signal device during classroom instruction increases the number of praise statements made by the teacher.
    • IV: use of electronic signal device; DV: number of praise statements. Answer: DV.
    • Item 4: The effects of a token reinforcement plus price treatment package on hand raising.
    • IV: the token reinforcement plus price treatment package; DV: hand raising. Answer: IV.
    • Item 5: Number of words spelled correctly following a spelling quiz review.
    • The example is phrased with a measurement focus (number of words spelled correctly) after a spelling quiz review; in the session, the spelling quiz review is treated as the IV, the DV is the number of words spelled correctly. Answer: IV (with the caveat that wording can mislead and the DV would be the measured spelling words).
  • Practical note from the session:
    • Sometimes wording like “number of X following Y” leads to confusion; listen for what is being manipulated (IV) vs what is being measured (DV).
  • Graph interpretation tip:
    • When looking at graphs, the DV is typically plotted over time, while the IV represents the conditions/treatment changes that are hypothesized to cause DV changes. A measure of implementation integrity or treatment fidelity may also be tracked, but it’s a separate consideration from the DV itself.

Pseudoscience vs science in ABA (Normand and Horner perspectives)

  • Core distinction: science is evidence-based and relies on empiricism, replicable methods, and rigorous evaluation; pseudoscience relies on anecdotes, unfalsifiable claims, or dismissal of scientific standards.
  • Characteristics of pseudoscience discussed in the session:
    • Anecdotal evidence presented as empirical support; claims not backed by systematic data.
    • Unfalsifiable claims; claims that cannot be disproven.
    • Dismissing scientific standards and peer-reviewed evidence; overreliance on testimonials or “cutting-edge” claims without solid data.
    • Marketing or media framing that pressures readers to accept claims without rigorous methodology.
    • Potential harm when professionals implement unvalidated interventions (e.g., chelation therapy, rebirthing, or similarly untested approaches).
  • Notable examples referenced:
    • Vaccines and autism debate (historical, debunked linkage) framed as a cautionary example of how misinterpreted claims can persist.
    • Facilitated communication (FC) controversy: debates over whether the client or facilitator generated the communication; emphasizes need for precise operational definitions and evaluation via rigorous methods.
    • Chelation therapy and other controversial interventions associated with risk/harm despite appealing rationales.
  • Core scientific attitudes highlighted:
    • Determinism and experimentation: behavior is governed by natural laws, tested via experiments.
    • Replication: findings should be reproducible by others to be credible.
    • Philosophical doubt: scientific conclusions should be held tentatively, open to revision.
    • Empiricism: conclusions derived from data and observable evidence.
    • Open-mindedness: be receptive to new ideas but not accept them without evidence; humorous maxim: open-minded but not so open that your brain falls out.
  • Practical ABA stance:
    • Evidence-based practice requires multiple well-designed studies and robust experimental control; a single study is not sufficient.
    • The literature should be evaluated for quality and quantity (not just the number of studies, but their rigor and replication).
  • Takeaway: cultivate a skeptic, evidence-based lens when evaluating interventions, and avoid conflating novel or sensational claims with proven practice.

Attitudes of science in ABA (summary of the cohort discussion)

  • Key attitudes emphasized:
    • Determinism: behavior operates under lawful, knowable causes.
    • Experimentation: using controlled manipulation to test hypotheses.
    • Replication: results should be repeatable across observers, settings, and times.
    • Philosophical doubt: maintain healthy skepticism about claims until supported by evidence.
    • Empiricism: rely on data rather than intuition or anecdote.
    • Open-mindedness: consider new evidence even if it challenges existing beliefs, while avoiding belief without evidence.
  • Practical implication: these attitudes undergird rigorous evaluation of evidence and help practitioners discern evidence-based practices from pseudoscience.

Threats to internal validity (Zane article and broader discussion)

  • Zane article focus (short, practical critique of one-group designs): highlights risks in pretest-posttest designs without control groups.
  • Common threats identified and discussed:
    • Subject characteristics / participant selection bias: differences in participants that affect treatment sensitivity or outcomes.
    • Attrition / loss of subjects: dropouts can bias results and undermine experimental control.
    • Location / geographic confounds: differences in setting/classrooms can influence outcomes.
    • Instrumentation: changes in the measurement system (e.g., different scales, observer drift) can alter measurements.
    • Cyclical variability: predictable patterns (e.g., Mondays worse due to schedule) that create data variability tied to cycles rather than the IV.
    • Adaptation / reactivity: novelty effects or observer presence changing behavior; Hawthorne effect and related phenomena.
    • History: external events occurring during the study that could affect outcomes.
    • Maturation: natural development or changes over time that affect outcomes independent of the IV.
    • Procedural infidelity (treatment integrity): deviations from the planned intervention or protocol; e.g., changing DRO duration without adjusting the plan.
  • Important conceptual takeaway from Zane and the broader readings:
    • In single-subject research, repeated measures and careful design help guard against threats; however, threats remain a central concern and must be anticipated and mitigated when possible.
    • There is no single “perfect” design; the optimal approach depends on the research question and practical constraints.
  • Sketch of how these threats relate to experimental design:
    • Strong experimental control (e.g., multiple baselines, alternating treatments, etc.) helps rule out rival hypotheses and confounds.
    • Threats to internal validity diminish confidence in attributing observed changes to the IV, undermining the evidence-based status of a practice.

Measuring validity and reliability in practice

  • Recap from student discussion:
    • Reliability = consistency of measurement across time/raters/situations.
    • Validity = accuracy of the measurement for the intended purpose.
    • A reliable measure is necessary for validity, but not sufficient by itself; the measure must also be valid for the construct and purpose at hand.
  • Practical questions to ask when evaluating a study:
    • Was there a baseline or a stable measurement period before introducing the IV?
    • Are the outcome measures appropriate for the research question and implementation fidelity?
    • Are measurement procedures consistent and calibrated across time and observers?
    • Are there evident confounds or extraneous variables that covary with the IV?
    • Is there evidence of rival hypotheses and steps taken to disprove them?

Procedures and fidelity: guarding against threats to internal validity

  • Treatment integrity / fidelity: ensure the intervention is implemented as intended.
  • How to guard against fidelity problems:
    • Clear, parsimonious operational definitions of the intervention components.
    • Regular training and refresher checks for implementers.
    • Fidelity checks or observers verifying that procedures are followed.
    • Documentation of what was done, when, and by whom, to support replication and defend against rival hypotheses.
  • Data integrity: ensure the measurement system itself isn’t introducing bias (instrument drift, observer bias, etc.).
  • Collaboration and openness with families/medical professionals when external factors (like medication) could influence outcomes; plan to manage or hold constant potential confounds when possible.

Practical classroom activity and assignment context

  • The instructor walked through a practice activity breakouts to discriminate IVs and DVs using quick polls and examples; emphasis on quick, clear, correct categorization.
  • Breakout activities and participation points were used to reinforce the material and provide applied practice in discriminating variables.
  • Assignment context: an upcoming assignment (Assignment Two) focuses on formulating a strong research question; building a foundation with concepts of validity, reliability, and experimental design helps with that task.

Quick reference: core definitions and relationships (summary)

  • Independent variable (IV): the intervention or condition deliberately manipulated.
  • Dependent variable (DV): the outcome measured to assess the effect of the IV.
  • Validity: whether the measurement or study design measures what it intends to measure for a given purpose.
  • Reliability: consistency of measurement across time, observers, and measurement occasions.
  • Internal validity: confidence that observed changes in DV are due to IV, not confounds.
  • External validity: generalizability of findings to other settings, people, times.
  • Threats to internal validity (examples): history, maturation, instrumentation, regression, attrition, location, subject characteristics, cyclical variability, adaptation/reactivity, procedural infidelity.
  • Rival hypotheses: alternative explanations for data that must be considered and ruled out to strengthen inferences about the IV–DV relationship.
  • Evidence-based practice: conclusions supported by multiple, high-quality, reproduced studies; one study alone typically insufficient.

Notes on notation and formulas (where applicable)

  • Functional relation in behavior analysis can be represented as a basic model:
    Y = f(X) + \varepsilon
    where Y is the dependent variable (behavioral outcome), X is the independent variable (treatment/condition), and \varepsilon is error/noise.
  • Reliability and validity relationship (conceptual):
    ext{Validity} \leq \text{Reliability}
  • General principle: high reliability is a prerequisite for high validity; but high reliability does not guarantee validity for the intended construct or purpose.