In 2017, a team at Google Brain published a paper titled 'Attention Is All You Need,' introducing a neural network architecture called the Transformer. The building blocks were deceptively simple: matrix multiplications, softmax functions, and a mechanism that let the model weigh which parts of an input to focus on. None of these components, taken individually, suggested anything revolutionary. Yet when researchers at OpenAI began stacking these layers and feeding them increasingly massive datasets, something unexpected happened. GPT-2, released in 2019 with 1.5 billion parameters, could generate surprisingly coherent paragraphs. But GPT-3, arriving in 2020 with 175 billion parameters — a hundredfold increase — didn't just write better paragraphs. It could translate languages it wasn't ...
Popular framing: Researchers built smarter AI and it started doing impressive things.
Structural analysis: Transformer layers plus scale produced capabilities that none of the individual components predicted; this is emergence on a power-law curve where small parameter increases unlock qualitatively new behavior at unpredictable thresholds. Feedback loops between capability, capital, and compute concentrated investment, accelerating the next jump. Second-order effects (labor, epistemics, governance) propagate at training-cycle speed while institutional response operates on decade timescales; the geometry of the curve, not researcher intent, drives the trajectory.
The popular framing — whether optimistic or pessimistic — treats capability as the central variable and everything else as response. The structural view inverts this: the feedback loops between capability, deployment, and capital concentration are the primary system, and 'emergent intelligence' is a narrative that both describes and accelerates those loops. Understanding the gap matters because policy interventions targeted at capabilities alone (compute limits, model restrictions) leave the underlying attractor dynamics intact.