NYT v. OpenAI — The Training-Data Tax

On December 27, 2023, the New York Times sued OpenAI and Microsoft in the Southern District of New York, alleging unlicensed use of millions of Times articles to train large language models, with sample outputs reproducing reported text near-verbatim. The popular framing names a newspaper accusing an AI company of stealing articles; the structural framing is that the modern generative-AI business model internalizes the value of public-internet data while externalizing the cost of producing that data, and the lawsuit asks the court whether that arrangement is fair use or a tragedy-of-commons writ at scale. The deeper tension is that data creators (newsrooms, academic publishers, code repositories, individual artists) face declining traffic and licensing revenue, while data consumers (fro...

Mental Models

Discourse Analysis

Popular framing: A newspaper sued an AI company for stealing articles.

Structural analysis: The generative-AI business model internalizes the value of public data while externalizing the cost of producing it, threatening the economic viability of the data creators it depends on. The case is a stress test for whether static-copy IP doctrine fits fluid-asset training.

Naming a single plaintiff protects the structure. The framing — tragedy-of-commons over web data, free-rider economics across creators, and regulatory-category mismatch — points to interventions at the seams of collective-licensing regimes, model-output attribution, and fair-use updates. The case’s outcome reshapes whether the next generation of frontier models has a data commons to train on.

Research Sources

Sources

Explore more scenarios on WiseApe

Loading...

Categories

Scenarios

All Models

🔍

Your Progress