The Replication Crisis: When Science Can't Reproduce Itself

In 2011, social psychologist Daryl Bem published a paper in a prestigious journal claiming to have found evidence for precognition — the ability to sense future events. The paper passed peer review. It used standard statistical methods. And it was, by broad scientific consensus, almost certainly wrong. But the methods Bem used were the same methods used throughout psychology and other social sciences. If those methods could produce evidence for psychic powers, what else had they falsely supported? The answer, it turned out, was a disturbing proportion of published research. In 2015, the Open Science Collaboration attempted to replicate 100 studies published in top psychology journals. Only 36 produced statistically significant results the second time. The effect sizes — even for studies...

Discourse Analysis

Popular framing: A few sloppy or dishonest researchers polluted the literature.

Structural analysis: When journals reward novelty and tenure committees count publications, the metric (significant p-values) replaces the target (true findings), and Goodhart's law does the rest. Publication bias, p-hacking, and HARKing aren't deviations from the system — they are what the system selects for, with peer review's principal-agent incentives too thin to catch them.

Focusing on individual misconduct or statistical techniques obscures the principal-agent dynamics that make these behaviors adaptive. Reforms targeting researcher behavior without restructuring journal incentives, funding criteria, and tenure metrics will be captured and neutralized — a classic Goodhart's Law pattern where new measures (pre-registration, p<0.005) become new targets to game rather than genuine proxies for reliable science.

Competing Interpretations

Individual Misconduct / Fraud: High-profile fraudsters like Diederik Stapel (fabricated data) and Brian Wansink (data manipulation) are held up as representative causes. The narr...

Flawed Statistical Practices: P-hacking, underpowered studies, HARKing (Hypothesizing After Results are Known), and misuse of NHST created a pipeline generating publishable fals...

Publication Bias as Market Failure: Journals systematically reject null results and replications, creating a literature that is a biased sample of all experiments conducted. The 'file...

Incentive Misalignment Across the System: Universities optimize for grant revenue and prestige; researchers optimize for publications; journals optimize for citations; funders optimize for ...

Soft Sciences Are Uniquely Fragile: Psychology and social sciences deal with context-dependent, variable human behavior — findings naturally decay across populations and time. The cri...

Science as a Prestige Market: Academic journals aren't 'gatekeepers of truth'; they are 'brokers of prestige.' They 'buy' attention with 'novelty.' Because 'replication' isn't '...

The Replication Crisis: When Science Can't Reproduce Itself

Mental Models

Discourse Analysis

Competing Interpretations

Research Sources

Sources

Categories

Scenarios

All Models

Your Progress