In 2011, social psychologist Brian Nosek began what would become one of science's most uncomfortable experiments — not on human subjects, but on science itself. He recruited 270 researchers to repeat 100 published psychology experiments, following the original methods as closely as possible. The results, published in 2015, sent shockwaves through academia: only 36 of the 100 studies produced statistically significant results the second time around. The roots of the problem ran deep. Academic careers lived and died by publication counts. Tenure committees counted papers. Funding agencies rewarded novelty. Journals overwhelmingly published positive, surprising findings — studies showing an effect — while relegating null results to desk drawers. A researcher who ran twenty experiments and ...
Popular framing: Scientists were cutting corners or outright cheating, and stricter oversight plus better statistical practices will restore trust in published research.
Structural analysis: The replication crisis is the predictable output of a principal-agent system where the agents (researchers) are evaluated on a proxy metric (publications, citations) that diverges from the principal's actual goal (true knowledge). Goodhart's Law guarantees that optimizing for the proxy corrupts it: researchers rationally engage in p-hacking, HARKing, and selective reporting not out of malice but because the system selects for these behaviors. No methodological fix resolves this without changing what gets rewarded in academic careers. The 'incentive misalignment' frame is good but misses the 'normalization' aspect — the community itself lost the ability to even *see* the deviance because it was so pervasive.
The popular framing locates the problem in individual behavior (fraud, sloppiness) or technical tools (bad statistics), which implies tractable fixes. The structural framing reveals that reforms designed within the existing incentive architecture — pre-registration, open data — are voluntarily adopted by the conscientious researchers who were least responsible for the crisis, while leaving career pressures unchanged for everyone else. Until replication studies generate tenure credit and null results earn publication slots, the underlying selection pressure for false positives persists.