Cognitive Biases & Program Scale-Up: Comprehensive Study Notes
Abstract & Big-Picture Take-Aways
Many social programs that look promising in pilots lose impact when taken to scale (the “voltage drop”).
Beyond implementation and sampling problems, cognitive biases in researchers, practitioners, funders, and policy makers systematically distort:
What evidence is sought, accepted, or ignored.
How pilot findings are interpreted.
Whether scale-up is pursued or blocked.
Chapter spotlights three pervasive biases—confirmation, status-quo (default), and bandwagon—and shows how each can steer scaling decisions toward false positives or away from true positives.
Calls for a psychologically informed science of scale-up: laboratory and field experiments to map when/where biases bite and how to redesign “choice architecture” to counter them.
Cognitive Biases: Definitions & Distinctions
Cognitive bias = systematic, predictable departure from logic/probability in judgment.
Hard-wired, automatic shortcuts that economize on time & cognitive load but can misfire (Kahneman’s “System 1”).
NOT:
Simple factual errors (can be fixed with information).
Rational responses to incentives (e.g., paying people).
≈ 185 biases catalogued; overlap, debate over taxonomy.
Evolutionary & functional roots: avoid decision fatigue, preserve identity, ease social coordination.
Degrees of Freedom in Social-Program Research
Researchers must choose: sample size, covariates, outcome construction, exclusion rules, model specs ⇒ “garden of forking paths” (Gelman & Loken).
These forks + publish-or-perish incentives drive selective significance hunting:
p \le .05 fetish ⇒ elevated false-positive risk.
Industry or mission-driven funding further tilts analyses toward desired findings.
Consequence for scale-up: statistical significance alone no longer reliable evidence.
Economic Model of Scaling (Al-Ubaydli et al.)
Decision makers should compute Post-Study Probability (PSP):
\text{PSP} = \frac{\text{Prior} \times \text{Power}}{\text{Prior} \times \text{Power} + (1-\text{Prior}) \times \alpha}Recommended rules of thumb:
Start with low prior 0.10 when evidence minimal.
Scale only when PSP ≥ 0.95 (≈ two well-powered, independent, significant replications).
Null findings should down-weight PSP.
Biases make real-world actors deviate from this Bayesian ideal.
Bias 1 – Confirmation Bias
Essence
Tendency to seek, interpret, recall evidence that confirms existing beliefs; discount disconfirming data.
Classic demo: Wason’s “2-4-6” rule discovery—subjects rarely try to falsify their hypothesis.
How It Warps Scale Decisions
Researchers select favorable outcomes, analytic specs, or subset of studies ⇒ inflate PSP.
Peer review more lenient toward manuscripts aligning with reviewers’ theories (Mahoney 1977).
Funders (agencies, foundations) bankroll projects fitting their agendas → entire portfolio biased.
Self-report outcomes (teacher, parent ratings) coloured by practitioners’ priors about what “should” work.
Recency/primacy effects: early positive pilots anchor beliefs; later nulls discounted.
Illustrative Scenarios
Two positive pilots vs. eight nulls: unbiased Bayesian would drop PSP below 0.95, but believer may overweight the two confirming studies and recommend scale-up.
Jack & Jill political news-feed example shows everyday manifestation.
Bias 2 – Status-Quo (Default) Bias
Essence
Preference for current state; aversion to change even when change is beneficial.
Possible mechanisms:
Heuristic “existing = good.”
Loss aversion (losses loom ≈ 2\times gains).
Empirical Evidence
401(k) enrollment jumps when default switches from opt-in to opt-out (Madrian & Shea).
Car-insurance study: majority stick with whichever plan is default in their state.
Lab shock-anxiety study: people forgo anxiety-reducing option when doing nothing is default.
Impact on Scale-Up
Good new program fails to spread because districts cling to incumbent curriculum.
Conversely, ineffective pilot that’s already embedded may get scaled simply because it is the status quo for that site.
Transition costs—real or perceived—magnify reluctance.
Bias 3 – Bandwagon Bias
Essence
“Hop on because others are on”: adoption driven by popularity, not intrinsic evidence.
Roots in social conformity (Asch line-length experiment: ≤75\% conformed at least once).
Pathways to Scaling Errors
Early false positive, amplified by charismatic champions or large grants, snowballs into mass adoption before replication.
Confirmation + bandwagon interact: once many peers approve, dissenting evidence faces higher discounting.
Real-World Illustrations
Education fads: open classrooms, whole-language reading, “new math.”
Medical misadventures: estrogen for heart-disease prevention, other procedures with weak evidence.
Factors That Modulate Bias Influence
Decision environment:
Forced reflection (even 0.1\text{ s} delay) improves accuracy.
Multiple independent replications stretch timeline, reducing impulsivity.
Autonomy vs. checks:
High autonomy (researcher analytic choices, teacher implementation) → more bias.
Peer review & oversight can curb but may share same biases.
Investment entrenchment: more effort sunk ⇒ harder to reverse (escalation of commitment).
Evidence mix: equal amounts of pro/contra data enable cherry-picking; overwhelming contrary corpus hard to ignore.
Incentives: publication bias, funding goals, career rewards can foster confirmatory searching.
Mitigation & Design Principles
Build “choice architecture” that nudges toward unbiased scaling:
Pre-registration, registered reports to lock analytic plans.
Reward replications; publish nulls to balance evidence base.
Require cumulative PSP ≥ 0.95 before large-scale rollout.
Create decision structures promoting deliberation: devil’s-advocate roles, diverse teams, explicit solicitation of disconfirming views.
Use behavioral tools (reminders, commitment devices) to counter default inertia in practice settings.
Balance review rigor with agility to avoid entrenchment.
Core Numbers, Equations & Technical Points
Statistical significance bar: p \le .05.
Recommended prior for brand-new intervention: 0.10.
Scale-up threshold PSP: 0.95 (≈ two independent, well-powered, significant studies without serious negatives).
Loss-aversion ratio: subjective pain of loss ≈ 2 × pleasure of equal gain.
Ethical & Practical Implications
Scaling false positives wastes public funds, damages trust, and crowds out effective programs.
Bias-aware processes are ethical imperatives for evidence-based policy.
Need for cross-disciplinary collaboration: economists’ PSP models + psychologists’ bias insights + implementation scientists’ fidelity tools.
Conclusion & Future Research Agenda
Cognitive biases likely play large but under-studied role in why “programs fail to scale.”
Research priorities:
Map bias incidence across research, funding, peer review, implementation stages.
Test interventions (decision delays, structured adversarial reviews, default flips) in both lab and field.
Until then, practitioners should act as “behavioral custodians,” embedding safeguards—replication requirements, transparent reporting, diversity of viewpoints—to curb confirmation, status-quo, and bandwagon pressures.