Hidden constraints: When scaling laws of AI break

Representative Image
OpenAI chief executive Sam Altman – perhaps the most prominent face of the artificial intelligence (AI) boom that accelerated with the launch of ChatGPT in 2022 – loves scaling laws.
These widely admired rules of thumb linking the size of an AI model with its capabilities inform much of the headlong rush among the AI industry to buy up powerful computer chips, build vast data centres, and re-open shuttered nuclear plants.
As Altman argued in a blog post earlier this year, the thinking is that the “intelligence” of an AI model “roughly equals the log of the resources used to train and run it” – meaning you can steadily produce better performance by exponentially increasing the scale of data and computing power.
First observed in 2020 and refined in 2022, the scaling laws for large language models (LLMs) come from drawing lines on charts of experimental data. For engineers, they give a simple formula that tells you how big to build the next model and what performance increase to expect.
Will the scaling laws keep on scaling as AI models get bigger? AI companies are betting hundreds of billions of dollars that they will – but history suggests it is not always so simple.
Laws aren’t just for AI
Scaling laws can be wonderful. Modern aerodynamics is built on them.
Using an elegant piece of mathematics called the Buckingham π theorem, engineers discovered how to compare small models in wind tunnels or test basins with full-scale planes and ships by making sure key numbers matched up. Those ideas inform the design of almost everything that flies or floats, as well as industrial fans and pumps.
Another famous scaling idea underpinned decades of the silicon chip boom. Moore’s law – the idea that the number of transistors on a microchip would double every two years or so – helped designers create the small, powerful computing technology we have today.
But there’s a catch: not all “scaling laws” are laws of nature. Some are purely mathematical and can hold indefinitely. Others are just lines fitted to data that work beautifully until you stray too far from the circumstances where they were measured.
When laws break down
History is littered with reminders of scaling laws that broke. A classic example is the collapse of the Tacoma Narrows Bridge in 1940.
The bridge was designed by scaling up what had worked for smaller bridges to something longer and slimmer. Engineers assumed the same arguments would hold: if a certain ratio of stiffness to bridge length worked before, it should work again.
Instead, moderate winds set off an unexpected instability called aeroelastic flutter. The bridge deck tore itself apart four months after opening.
Likewise, even the “laws” of microchip manufacturing had an expiry date. For decades, Moore’s law and Dennard scaling were astonishingly reliable guides. But as transistors shrank to nanometre sizes, those rules collided with hard physical limits.
Rules of thumb?
The language-model scaling curves that Altman celebrates are real and extraordinarily useful. They told researchers that models would keep getting better if given enough data and computing power, and that earlier systems were not fundamentally limited – they just hadn’t had enough resources.
But these are curves fit to data. They resemble the useful rules of thumb in microchip design more than the derived mathematical laws used in aerodynamics – and that means they likely won’t work forever.
Language-model scaling rules don’t encode real-world constraints such as the availability of high-quality training data, the difficulty of tackling novel tasks, safety limits, or the economic realities of building data centres and power grids. There is no law of nature guaranteeing that “intelligence scales” forever.
So far, the scaling curves for AI look smooth – but the financial curves are different.
Deutsche Bank recently warned of an AI “funding gap” based on Bain Capital estimates of a US$800 billion mismatch between projected AI revenues and the investment in chips, data centres and power required to keep current growth going.
JP Morgan has estimated that the broader AI sector might need around US$650 billion in annual revenue just to earn a modest 10% return on the planned build-out of AI infrastructure.
Altman’s bet is that the LLM scaling laws will continue. If that’s so, it may be worth building enormous amounts of computing power because the gains are predictable. On the other hand, the banks’ growing unease is a reminder that some scaling stories can turn out to be Tacoma Narrows: beautiful curves in one context, hiding a nasty surprise in the next.
The Conversation

