Cognitive Biases & Program Scale-Up: Comprehensive Study Notes

Many social programs that look promising in pilots lose impact when taken to scale (the “voltage drop”).
Beyond implementation and sampling problems, cognitive biases in researchers, practitioners, funders, and policy makers systematically distort:
- What evidence is sought, accepted, or ignored.
- How pilot findings are interpreted.
- Whether scale-up is pursued or blocked.
Chapter spotlights three pervasive biases—confirmation, status-quo (default), and bandwagon—and shows how each can steer scaling decisions toward false positives or away from true positives.
Calls for a psychologically informed science of scale-up: laboratory and field experiments to map when/where biases bite and how to redesign “choice architecture” to counter them.

Cognitive bias = systematic, predictable departure from logic/probability in judgment.
- Hard-wired, automatic shortcuts that economize on time & cognitive load but can misfire (Kahneman’s “System 1”).
NOT:
- Simple factual errors (can be fixed with information).
- Rational responses to incentives (e.g., paying people).
≈ $185$ biases catalogued; overlap, debate over taxonomy.
Evolutionary & functional roots: avoid decision fatigue, preserve identity, ease social coordination.

Researchers must choose: sample size, covariates, outcome construction, exclusion rules, model specs ⇒ “garden of forking paths” (Gelman & Loken).
These forks + publish-or-perish incentives drive selective significance hunting:
- $p \le .05$ fetish ⇒ elevated false-positive risk.
- Industry or mission-driven funding further tilts analyses toward desired findings.
Consequence for scale-up: statistical significance alone no longer reliable evidence.

Decision makers should compute Post-Study Probability (PSP):
$\text{PSP} = \frac{\text{Prior} \times \text{Power}}{\text{Prior} \times \text{Power} + (1-\text{Prior}) \times \alpha}$
Recommended rules of thumb:
- Start with low prior $0.10$ when evidence minimal.
- Scale only when PSP ≥ $0.95$ (≈ two well-powered, independent, significant replications).
- Null findings should down-weight PSP.
Biases make real-world actors deviate from this Bayesian ideal.

Tendency to seek, interpret, recall evidence that confirms existing beliefs; discount disconfirming data.
Classic demo: Wason’s “2-4-6” rule discovery—subjects rarely try to falsify their hypothesis.

Researchers select favorable outcomes, analytic specs, or subset of studies ⇒ inflate PSP.
Peer review more lenient toward manuscripts aligning with reviewers’ theories (Mahoney 1977).
Funders (agencies, foundations) bankroll projects fitting their agendas → entire portfolio biased.
Self-report outcomes (teacher, parent ratings) coloured by practitioners’ priors about what “should” work.
Recency/primacy effects: early positive pilots anchor beliefs; later nulls discounted.

Two positive pilots vs. eight nulls: unbiased Bayesian would drop PSP below $0.95$ , but believer may overweight the two confirming studies and recommend scale-up.
Jack & Jill political news-feed example shows everyday manifestation.

Preference for current state; aversion to change even when change is beneficial.
Possible mechanisms:
- Heuristic “existing = good.”
- Loss aversion (losses loom ≈ $2\times$ gains).

401(k) enrollment jumps when default switches from opt-in to opt-out (Madrian & Shea).
Car-insurance study: majority stick with whichever plan is default in their state.
Lab shock-anxiety study: people forgo anxiety-reducing option when doing nothing is default.

Good new program fails to spread because districts cling to incumbent curriculum.
Conversely, ineffective pilot that’s already embedded may get scaled simply because it is the status quo for that site.
Transition costs—real or perceived—magnify reluctance.

“Hop on because others are on”: adoption driven by popularity, not intrinsic evidence.
Roots in social conformity (Asch line-length experiment: ≤ $75\%$ conformed at least once).

Early false positive, amplified by charismatic champions or large grants, snowballs into mass adoption before replication.
Confirmation + bandwagon interact: once many peers approve, dissenting evidence faces higher discounting.

Education fads: open classrooms, whole-language reading, “new math.”
Medical misadventures: estrogen for heart-disease prevention, other procedures with weak evidence.

Decision environment:
- Forced reflection (even $0.1\text{ s}$ delay) improves accuracy.
- Multiple independent replications stretch timeline, reducing impulsivity.
Autonomy vs. checks:
- High autonomy (researcher analytic choices, teacher implementation) → more bias.
- Peer review & oversight can curb but may share same biases.
Investment entrenchment: more effort sunk ⇒ harder to reverse (escalation of commitment).
Evidence mix: equal amounts of pro/contra data enable cherry-picking; overwhelming contrary corpus hard to ignore.
Incentives: publication bias, funding goals, career rewards can foster confirmatory searching.

Build “choice architecture” that nudges toward unbiased scaling:
- Pre-registration, registered reports to lock analytic plans.
- Reward replications; publish nulls to balance evidence base.
- Require cumulative PSP ≥ $0.95$ before large-scale rollout.
Create decision structures promoting deliberation: devil’s-advocate roles, diverse teams, explicit solicitation of disconfirming views.
Use behavioral tools (reminders, commitment devices) to counter default inertia in practice settings.
Balance review rigor with agility to avoid entrenchment.

Statistical significance bar: $p \le .05$ .
Recommended prior for brand-new intervention: $0.10$ .
Scale-up threshold PSP: $0.95$ (≈ two independent, well-powered, significant studies without serious negatives).
Loss-aversion ratio: subjective pain of loss ≈ $2$ × pleasure of equal gain.

Scaling false positives wastes public funds, damages trust, and crowds out effective programs.
Bias-aware processes are ethical imperatives for evidence-based policy.
Need for cross-disciplinary collaboration: economists’ PSP models + psychologists’ bias insights + implementation scientists’ fidelity tools.

Cognitive biases likely play large but under-studied role in why “programs fail to scale.”
Research priorities:
- Map bias incidence across research, funding, peer review, implementation stages.
- Test interventions (decision delays, structured adversarial reviews, default flips) in both lab and field.
Until then, practitioners should act as “behavioral custodians,” embedding safeguards—replication requirements, transparent reporting, diversity of viewpoints—to curb confirmation, status-quo, and bandwagon pressures.