The Scientific Replication Crisis

In 2011, social psychologist Brian Nosek began what would become one of science's most uncomfortable experiments — not on human subjects, but on science itself. He recruited 270 researchers to repeat 100 published psychology experiments, following the original methods as closely as possible. The results, published in 2015, sent shockwaves through academia: only 36 of the 100 studies produced statistically significant results the second time around. The roots of the problem ran deep. Academic careers lived and died by publication counts. Tenure committees counted papers. Funding agencies rewarded novelty. Journals overwhelmingly published positive, surprising findings — studies showing an effect — while relegating null results to desk drawers. A researcher who ran twenty experiments and ...

Discourse Analysis

Popular framing: Some scientists committed fraud or were sloppy; weed them out and science is fine.

Structural analysis: Academic careers are measured by publication count and novelty, journals select for positive surprising findings, and null results have no career home — when the measure becomes the target, p-hacking and selective reporting become rational moves. Principal-agent gaming between researchers and institutions, plus base-rate neglect in interpreting any single significant result, produce a literature whose published effects systematically overstate true effects. The crisis is the predicted output of the incentive structure, and no methodological fix — preregistration, larger samples, open data — resolves it without changing what gets rewarded in academic careers.

The popular framing locates the problem in individual behavior (fraud, sloppiness) or technical tools (bad statistics), which implies tractable fixes. The structural framing reveals that reforms designed within the existing incentive architecture — pre-registration, open data — are voluntarily adopted by the conscientious researchers who were least responsible for the crisis, while leaving career pressures unchanged for everyone else. Until replication studies generate tenure credit and null results earn publication slots, the underlying selection pressure for false positives persists.

Competing Interpretations

Bad Apples: Fraud and Misconduct: The crisis stems from deliberate misconduct by individual researchers who fabricate or manipulate data. Stricter oversight, retractions, and career...

Methodological Fix: Statistics Were Wrong: Researchers misunderstood or misapplied null hypothesis significance testing. p-values were used as binary truth-detectors rather than probabilisti...

Structural: Publish-or-Perish Incentive Failure: Academic career structures reward novelty, volume, and positive results over accuracy and replicability. Journals, tenure committees, and funding a...

Survivorship Bias: The File Drawer Problem: Published literature is a heavily filtered sample of all experiments conducted. Null results are suppressed; positive results are published. Consum...

Epistemic: Effect Sizes Were Always Small: Psychology studies genuinely small, context-dependent effects. Replication failures often reflect real variation across populations, contexts, and ...

The Nash Equilibrium of Bad Science: The crisis isn't a bug; it's a Nash Equilibrium. If you do 'rigorous' science, you publish less and lose your funding to the guy who 'p-hacks.' Eve...

The Scientific Replication Crisis

Mental Models

Discourse Analysis

Competing Interpretations

Research Sources

Sources

Categories

Scenarios

All Models

Your Progress