Cognitive Biases & Program Scale-Up: Comprehensive Study Notes

Abstract & Big-Picture Take-Aways

  • Many social programs that look promising in pilots lose impact when taken to scale (the “voltage drop”).

  • Beyond implementation and sampling problems, cognitive biases in researchers, practitioners, funders, and policy makers systematically distort:

    • What evidence is sought, accepted, or ignored.

    • How pilot findings are interpreted.

    • Whether scale-up is pursued or blocked.

  • Chapter spotlights three pervasive biases—confirmation, status-quo (default), and bandwagon—and shows how each can steer scaling decisions toward false positives or away from true positives.

  • Calls for a psychologically informed science of scale-up: laboratory and field experiments to map when/where biases bite and how to redesign “choice architecture” to counter them.

Cognitive Biases: Definitions & Distinctions

  • Cognitive bias = systematic, predictable departure from logic/probability in judgment.

    • Hard-wired, automatic shortcuts that economize on time & cognitive load but can misfire (Kahneman’s “System 1”).

  • NOT:

    • Simple factual errors (can be fixed with information).

    • Rational responses to incentives (e.g., paying people).

  • ≈ 185 biases catalogued; overlap, debate over taxonomy.

  • Evolutionary & functional roots: avoid decision fatigue, preserve identity, ease social coordination.

Degrees of Freedom in Social-Program Research

  • Researchers must choose: sample size, covariates, outcome construction, exclusion rules, model specs ⇒ “garden of forking paths” (Gelman & Loken).

  • These forks + publish-or-perish incentives drive selective significance hunting:

    • p \le .05 fetish ⇒ elevated false-positive risk.

    • Industry or mission-driven funding further tilts analyses toward desired findings.

  • Consequence for scale-up: statistical significance alone no longer reliable evidence.

Economic Model of Scaling (Al-Ubaydli et al.)

  • Decision makers should compute Post-Study Probability (PSP):
    \text{PSP} = \frac{\text{Prior} \times \text{Power}}{\text{Prior} \times \text{Power} + (1-\text{Prior}) \times \alpha}

  • Recommended rules of thumb:

    • Start with low prior 0.10 when evidence minimal.

    • Scale only when PSP ≥ 0.95 (≈ two well-powered, independent, significant replications).

    • Null findings should down-weight PSP.

  • Biases make real-world actors deviate from this Bayesian ideal.

Bias 1 – Confirmation Bias

Essence

  • Tendency to seek, interpret, recall evidence that confirms existing beliefs; discount disconfirming data.

  • Classic demo: Wason’s “2-4-6” rule discovery—subjects rarely try to falsify their hypothesis.

How It Warps Scale Decisions

  • Researchers select favorable outcomes, analytic specs, or subset of studies ⇒ inflate PSP.

  • Peer review more lenient toward manuscripts aligning with reviewers’ theories (Mahoney 1977).

  • Funders (agencies, foundations) bankroll projects fitting their agendas → entire portfolio biased.

  • Self-report outcomes (teacher, parent ratings) coloured by practitioners’ priors about what “should” work.

  • Recency/primacy effects: early positive pilots anchor beliefs; later nulls discounted.

Illustrative Scenarios

  • Two positive pilots vs. eight nulls: unbiased Bayesian would drop PSP below 0.95, but believer may overweight the two confirming studies and recommend scale-up.

  • Jack & Jill political news-feed example shows everyday manifestation.

Bias 2 – Status-Quo (Default) Bias

Essence

  • Preference for current state; aversion to change even when change is beneficial.

  • Possible mechanisms:

    • Heuristic “existing = good.”

    • Loss aversion (losses loom ≈ 2\times gains).

Empirical Evidence

  • 401(k) enrollment jumps when default switches from opt-in to opt-out (Madrian & Shea).

  • Car-insurance study: majority stick with whichever plan is default in their state.

  • Lab shock-anxiety study: people forgo anxiety-reducing option when doing nothing is default.

Impact on Scale-Up

  • Good new program fails to spread because districts cling to incumbent curriculum.

  • Conversely, ineffective pilot that’s already embedded may get scaled simply because it is the status quo for that site.

  • Transition costs—real or perceived—magnify reluctance.

Bias 3 – Bandwagon Bias

Essence

  • “Hop on because others are on”: adoption driven by popularity, not intrinsic evidence.

  • Roots in social conformity (Asch line-length experiment: ≤75\% conformed at least once).

Pathways to Scaling Errors

  • Early false positive, amplified by charismatic champions or large grants, snowballs into mass adoption before replication.

  • Confirmation + bandwagon interact: once many peers approve, dissenting evidence faces higher discounting.

Real-World Illustrations

  • Education fads: open classrooms, whole-language reading, “new math.”

  • Medical misadventures: estrogen for heart-disease prevention, other procedures with weak evidence.

Factors That Modulate Bias Influence

  • Decision environment:

    • Forced reflection (even 0.1\text{ s} delay) improves accuracy.

    • Multiple independent replications stretch timeline, reducing impulsivity.

  • Autonomy vs. checks:

    • High autonomy (researcher analytic choices, teacher implementation) → more bias.

    • Peer review & oversight can curb but may share same biases.

  • Investment entrenchment: more effort sunk ⇒ harder to reverse (escalation of commitment).

  • Evidence mix: equal amounts of pro/contra data enable cherry-picking; overwhelming contrary corpus hard to ignore.

  • Incentives: publication bias, funding goals, career rewards can foster confirmatory searching.

Mitigation & Design Principles

  • Build “choice architecture” that nudges toward unbiased scaling:

    • Pre-registration, registered reports to lock analytic plans.

    • Reward replications; publish nulls to balance evidence base.

    • Require cumulative PSP ≥ 0.95 before large-scale rollout.

  • Create decision structures promoting deliberation: devil’s-advocate roles, diverse teams, explicit solicitation of disconfirming views.

  • Use behavioral tools (reminders, commitment devices) to counter default inertia in practice settings.

  • Balance review rigor with agility to avoid entrenchment.

Core Numbers, Equations & Technical Points

  • Statistical significance bar: p \le .05.

  • Recommended prior for brand-new intervention: 0.10.

  • Scale-up threshold PSP: 0.95 (≈ two independent, well-powered, significant studies without serious negatives).

  • Loss-aversion ratio: subjective pain of loss ≈ 2 × pleasure of equal gain.

Ethical & Practical Implications

  • Scaling false positives wastes public funds, damages trust, and crowds out effective programs.

  • Bias-aware processes are ethical imperatives for evidence-based policy.

  • Need for cross-disciplinary collaboration: economists’ PSP models + psychologists’ bias insights + implementation scientists’ fidelity tools.

Conclusion & Future Research Agenda

  • Cognitive biases likely play large but under-studied role in why “programs fail to scale.”

  • Research priorities:

    • Map bias incidence across research, funding, peer review, implementation stages.

    • Test interventions (decision delays, structured adversarial reviews, default flips) in both lab and field.

  • Until then, practitioners should act as “behavioral custodians,” embedding safeguards—replication requirements, transparent reporting, diversity of viewpoints—to curb confirmation, status-quo, and bandwagon pressures.