The Replication Crisis: When Science Can't Reproduce Itself

In 2011, social psychologist Daryl Bem published a paper in a prestigious journal claiming to have found evidence for precognition — the ability to sense future events. The paper passed peer review. It used standard statistical methods. And it was, by broad scientific consensus, almost certainly wrong. But the methods Bem used were the same methods used throughout psychology and other social sciences. If those methods could produce evidence for psychic powers, what else had they falsely supported? The answer, it turned out, was a disturbing proportion of published research. In 2015, the Open Science Collaboration attempted to replicate 100 studies published in top psychology journals. Only 36 produced statistically significant results the second time. The effect sizes — even for studies...

Mental Models

Discourse Analysis

Popular framing: A few sloppy or dishonest researchers polluted the literature.

Structural analysis: When journals reward novelty and tenure committees count publications, the metric (significant p-values) replaces the target (true findings), and Goodhart's law does the rest. Publication bias, p-hacking, and HARKing aren't deviations from the system — they are what the system selects for, with peer review's principal-agent incentives too thin to catch them.

Focusing on individual misconduct or statistical techniques obscures the principal-agent dynamics that make these behaviors adaptive. Reforms targeting researcher behavior without restructuring journal incentives, funding criteria, and tenure metrics will be captured and neutralized — a classic Goodhart's Law pattern where new measures (pre-registration, p<0.005) become new targets to game rather than genuine proxies for reliable science.

Competing Interpretations

Research Sources

Sources

Explore more scenarios on WiseApe

Loading...

Categories

Scenarios

All Models

🔍

Your Progress