In 2011, social psychologist Brian Nosek began what would become one of science's most uncomfortable experiments — not on human subjects, but on science itself. He recruited 270 researchers to repeat 100 published psychology experiments, following the original methods as closely as possible. The results, published in 2015, sent shockwaves through academia: only 36 of the 100 studies produced statistically significant results the second time around. The roots of the problem ran deep. Academic careers lived and died by publication counts. Tenure committees counted papers. Funding agencies rewarded novelty. Journals overwhelmingly published positive, surprising findings — studies showing an effect — while relegating null results to desk drawers. A researcher who ran twenty experiments and ...
Popular framing: Some scientists committed fraud or were sloppy; weed them out and science is fine.
Structural analysis: Academic careers are measured by publication count and novelty, journals select for positive surprising findings, and null results have no career home — when the measure becomes the target, p-hacking and selective reporting become rational moves. Principal-agent gaming between researchers and institutions, plus base-rate neglect in interpreting any single significant result, produce a literature whose published effects systematically overstate true effects. The crisis is the predicted output of the incentive structure, and no methodological fix — preregistration, larger samples, open data — resolves it without changing what gets rewarded in academic careers.
The popular framing locates the problem in individual behavior (fraud, sloppiness) or technical tools (bad statistics), which implies tractable fixes. The structural framing reveals that reforms designed within the existing incentive architecture — pre-registration, open data — are voluntarily adopted by the conscientious researchers who were least responsible for the crisis, while leaving career pressures unchanged for everyone else. Until replication studies generate tenure credit and null results earn publication slots, the underlying selection pressure for false positives persists.