Teb's Lab
Posts
It's Not All Exponential Growth

It's Not All Exponential Growth

Soaring costs meet a performance plateau

Tyler Bettilyon
April 26, 2024

I’m Tyler Elliot Bettilyon (Teb) and this is a Brief: Our shortish mid-month edition.

If you’re new to the Lab Report you can subscribe here.

If you like what you’re reading you’ll love one of our classes. Signup for an upcoming class, browse our course catalog for corporate trainings, or request a custom class consultation.

Line Goes Up?

By now, you’ve heard that AI is experiencing “exponential growth.” Depending on who you ask, this growth might cause the world's end or bring about Star Trek’s dream of fully automated luxury capitalism. We all hope it’s the latter, but know in our hearts that if it’s only one of the two… it’s probably the apocalypse.

In a recent interview on 20VC, OpenAI’s CEO Sam Altman made a less audacious claim when he advised startups that they should plan their business “assuming Open AI will stay on the same rate of trajectory and the models are gonna keep getting better at the same pace,” otherwise he added, “we’re gonna steamroll you.”

But neither of those trends are what we’ve seen from ML historically. Here are six charts from Papers With Code showing improvement over time on popular ML benchmarks. We see incremental improvement with periodic bursts, typically with a slowing growth rate over time.

Machine learning systems often struggle with a “last mile” problem: It’s easy to go from terrible to good, but much harder to go from good to excellent. Notably, as performance reaches human levels, progress slows and further gains are harder won. Models can and do surpass humans, but it takes much more work per unit of improvement. Here’s a chart from Stanford’s Human-Centered AI (HAI) lab in their recent AI Index report showing this pleateu:

There are some truly exponential trends, though: Compute resources, energy use, and training costs. Here are two charts via IEEE Spectrum (based on the same HAI Index) showing the tremendous growth in carbon footprint — which is a fuzzy proxy for energy use — and cost associated with training foundation models:

These costs are growing exponentially because the models are growing exponentially, as reported by the research firm Epoch AI:

Model size slowly increased by 7 orders of magnitude from the 1950s to around 2018. Since 2018, growth has accelerated for language models, with model size increasing by another 4 orders of magnitude in the four years from 2018 to 2022 (see Figure 1). Other domains like vision have grown at a more moderate pace, but still faster than before 2018.

Epoch AI (emphasis original)

Which, in turn, has caused the dataset size to grow exponentially. Research by Deep Mind in 2022 found that growth in model size only results in commensurate growth in model performance when the datasets grow at the same rate:

By training over 400 language models ranging from 70 million to over 16 billion parameters on 5 to 500 billion tokens, we find that for compute-optimal training, the model size and the number of training tokens should be scaled equally: for every doubling of model size the number of training tokens should also be doubled.

Deep Mind, many authors. (Emphasis mine)

The Quest for The Holy Scale

Source: DALL-E-3

True believers will sell you a “scale is all you need” T-shirt. And, In fairness to them, nearly every key innovation in Deep Learning research over the past decade has been a mechanism that allowed models to scale up more effectively.

In 2012, AlexNet’s key contributions were “a very efficient GPU implementation of convolutional nets” and the first mainstream use of the Rectified Linear Unit (ReLU) as an activation function. ReLU’s gradient is much more efficient to compute than sigmoid, which was the standard at the time. ReLU also helped mitigate something called the vanishing gradient problem.

These improvements allowed networks to train more rounds given a fixed amount of computation and fixed dataset, which eventually resulted in overfitting.

In 2014, dropout was introduced as “a simple way to prevent neural networks from overfitting.” Dropout allowed models to be trained more rounds on a dataset of a fixed size, but researchers were still struggling to make networks deeper. AlexNet was only 5 convolutional layers (pitifully shallow, by today’s standards). VGG got us to ~19 layers by using smaller convolutional kernels.

In 2015, the residual or “skip” connection blew the lid off the vanishing gradient problem. Combined with another 2015 innovation, batch normalization, so-called “ResNets” could train effectively with 152 layers and roughly 115 million parameters.

Side note: That’s still paltry by today’s standards. GPT-3 has 175 billion parameters, and GPT-4 is rumored to have 1.7 trillion parameters, which is ~14,700 times larger than ResNet 152.

Language models at the time benefited from ReLU, batch norm, and skip connections. However, the state-of-the-art models were all some form of recurrent neural network, which have a crucial bottleneck: they must fully process each word before moving on to the next one. This made them a poor fit for GPU processing.

It’s hard to overstate how devastating this was for ML-based language processing. CPU clock speeds have been stagnant for nearly 20 years. Almost all the improvements in high-performance computing during that time have come from some form of parallelization. This bottleneck effectively locks recurrent neural networks out of those performance gains.

This is why, for example, facial recognition was being commodified while chatbots were still in their “Microsoft Tay” era.

Finally, in 2017, the “attention is all you need” paper did for language what AlexNet did for vision: introduced a highly parallelizable, GPU-efficient mechanism for training a neural network on language data. After that, LLMs were off to the races.

There have been a handful of non-trivial innovations since then. Mixture of experts layers and reinforcement learning with human feedback come to mind. But a lot of the progress in LLMs has just been scaling up the basic attention mechanism by increasing the size of embeddings and context windows, adding more “heads” of attention per layer, and adding more attention layers.

Still, all this scaling is what resulted in the charts above. This is why even luminaries with skin in the game, such as Meta’s AI chief Yann LeCun, can say LLMs are “useful, there's no question. But they are not a path towards human-level intelligence.”

Where Does That Leave Machine Learning?

Prognosticators are increasingly using “AI” and ”bubble” in the same headline. The industry is in a weird place. There’s a lot of FOMO, and eager grifters taking advantage of the trend.

Startups are burning through cash chasing SOTA results. But even with huge investments, smaller firms can’t spend the kind of money it will cost to train the next generation of models. This reality is behind high-profile executive departures last month from Inflection AI and Stability AI; both were seasoned AI veterans seeking better-capitalized firms.

Meanwhile, those better-capitalized giants are cutting corners and dredging the depths of the internet out of desperation to acquire data sets big enough to train the next generation of Large Language Models. Or they’re spending $150 billion on new data centers to quench AI’s insatiable computational thirst.

Self-driving car hype has fallen off a cliff. Cruise Automation’s internal share price was slashed by 50% in February following a high-profile accident and crushing safety analysis. Apple just closed its self-driving division, laying off 600 workers. Uber and Lyft both shuttered their self-driving divisions, throwing in the towel in 2020 and 2021, respectively. Both have since partnered with Motional to offer limited robotaxi services, though.

All that even though fully autonomous vehicles are deployed and operating today, albeit in limited circumstances.

The most successful self-driving firm, Alphabet’s Waymo, says they’re taking “a careful and incremental approach” to service expansion — decidedly not exponential. When asked about Waymo’s biggest internal obstacle, their chief product officer, Saswat Panigrahi, replied, “bringing the cost down.”

When you hear Sam Altman say human-level artificial general intelligence will be here in the “reasonably close-ish future,” remind yourself that in 2016 Lyft co-founder John Zimmer predicted personal car ownership would “all but end” by 2025.

Still, the lesson of the dotcom era wasn’t that the internet was a horrible technology without real use cases. A lot of bullshit chatbots will surely go the way of pets.com. But ML is already powering immensely popular and economically valuable tools and services, too. Advertising networks, recommendation engines, and spam filters are all ML-based. Research and development in ML drug and materials discovery looks promising. Robotics seems poised to have a warm day in the sun using ML techniques. LLMs seem like a good fit for genetic data. Generative image, sound, and video models all have legitimate applications in creative enterprises.

Here’s one last chart from the AI Index. It shows decreased costs and increased revenue attributed to embracing AI in the workplace. Not all generative AI — but a wide variety of ML tools. Many of which are much simpler than the current batch of LLMs.

Source: https://aiindex.stanford.edu/report/

Steady, incremental improvements will still deliver real value for firms that embrace machine learning. Yes, many ML firms will collapse as the LLM hype subsides, but ML itself is here to stay.

Remember…

The Lab Report is free and doesn’t even advertise. Our curricula is open source and published under a public domain license for anyone to use for any purpose. We’re also a very small team with no investors.

Help us keep providing these free services by scheduling one of our world class trainings, requesting a custom class for your team, or taking one of our open enrollment classes.

Reply

or to participate.