• Teb's Lab
  • Posts
  • ML Failure Part Two: Extrapolation vs Interpolation

ML Failure Part Two: Extrapolation vs Interpolation

News: Twitter is X and AI's future is uncertain

The Weekly Lab Report

I’m Tyler Elliot Bettilyon (Teb) and this is the Lab Report: cut through the noise with our weekly rundown of software and technology news.

If you’re new to the Lab Report you can subscribe here. If you like what you’re reading you’ll love one of our classes. Schedule a training from our catalog or request a custom class consultation.

From The Lab

This week we ran the first half of a custom class for a prominent coffee purveyor. The class pairs intro to SQL and intro to Python as part of a longer course at the company regarding data analysis and visualization. Browse the open source materials or get in touch to schedule a custom class of your own.

Today’s Lesson: How ML Fails Part 2

All the code from today’s lesson can be viewed on Github and Google Colab.

Extrapolation vs Interpolation

In statistics interpolation and extrapolation are both types of estimation based on data. Interpolation is an estimate made within the bounds of the existing data, extrapolation is an estimate made outside those bounds. Although they are sometimes presented as a binary, predictions and estimations often exist on a spectrum between the two.

Imagine we’re the owner of an ice cream shop. We want to make a sales estimate for tomorrow. Tomorrow hasn’t happened, so in that sense this is fundamentally an extrapolation problem: tomorrow’s sales could not possibly be in our existing data set. However, we know it’s going to be a hot Saturday in the middle of summer. We have sales records from similar days in the past, so in that sense it’s an interpolation problem.

In general — and for ML specifically — performing extrapolation is much harder. Forcing a model make predictions outside the bounds of its training data is a common cause for failure.

An Abstract Example: Predicting Sine

Consider the humble sine curve. Sine is a perfectly repeating pattern, there is no noise, and its period is consistent. It’s just a boring, repetitious curve.

The sine function from -5π to 5π

Many ML models can easily learn the features of sine within the bounds of the training data. For example, here’s a fairly simple neural network’s approximation of sine from -5π to 5π. Our model’s output in red, the training data in blue:

A neural network easily learns sine…

Clearly, our model can do interpolation quite well. But what happens when we ask this model to make predictions outside the bounds of the original data?

The same model’s predictions for sine outside the bounds of the training data.

Unfortunately, our model has failed abysmally at extrapolation. It learned sine almost perfectly from -5π to 5π but learned a near-linear function below -5π and above 5π.

And it gets worse. This is what happens when we train the same model on sine with a few holes in the sample:

The model also fails to properly estimate within the missing areas from the training data.

The hole on the right isn’t terrible, but the hole on the left is basically a horizontal line covering what should be a full period of the sine curve.

In the real world these sorts of issues arise all the time.

Real World Examples

ChatGPT and Esoteric Poetry

Consider the “domain” of a model like ChatGPT: you need a representative sample of “human written text” within every subdomain where users might prompt the model.

While the dataset for ChatGPT is enormous, it still doesn’t contain many samples of every esoteric topic under the sun. In December Ian Bogost, a technology writer at The Atlantic, challenged ChatGPT to write in esoteric styles of poetry (among other things). Here’s what he found:

I also urged the AI to generate a lai (a medieval narrative poem) in the style of Marie de France about the beloved, Texas-based fast-food chain Whataburger. It sure as heck looked a lot like a lai, beginning like this:

In the land of Whataburger, where burgers and fries are king,
A stranger came to visit, and with him brought a ring.
He wore a cloak of velvet, and a hat upon his head,
And he smiled with a twinkle in his eye as he approached the counter, where the burgers are served hot and fresh, and the fries are crispy and golden.

Except for that last line, which is a metrical mess. The AI […] was able to tell me about the typical structure of a lai—a short, narrative poem of the 12th century with a particular structure and form, usually written in octosyllabic couplets. The lyrical errors in the generated lai arise from its failure to adhere to that metrical form.

Although the model was trained on enough informational content to correctly define a lai, it wasn’t trained on enough lais to correctly produce one. This demonstrates another weakness of such models: they do not really “understand” the things they produce the way humans usually think of understanding. While the model correctly defined a lai, its “knowledge” of their structure did not translate to producing a poem that actually has that structure.

Fatal Self Driving Crash

In 2018 a self-driving SUV operated by Uber struck and killed Elaine Herzberg. An investigation found that in the moments before the crash the system was struggling to properly classify Herzberg as a pedestrian that needed to be avoided.

Herzberg was doing something that the AI had never encountered during training: Jaywalking with her bike. The system had seen pedestrians in and out of crosswalks. It had seen cyclists on the roadway AND pedestrians walking with a bike in a crosswalk. But never a pedestrian walking with a bike outside a crosswalk.

As a result the system oscillated between classifying her as a pedestrian, bicycle, and vehicle which in turn caused the system to incorrectly predict her path, which ultimately led to the crash.

This problem is a big one for self-driving cars because just about anything can happen on the roadways. It’s also why the most successful self-driving firms have started in places like Arizona, where the streets are wide and weather conditions are fairly constant and predictable. Those factors decrease the size of “representative driving conditions.”

Big problems, Big Data

The extrapolation problem is another reason large ML models need so much data. If we expect our ML model to succeed, we need truly representative data sets that include the entire domain of whatever problem it is we’re hoping to solve.

Ask yourself: what is “truly representative” of the problem of driving, or “writing at a human level of proficiency” and you begin to understand why some researchers think we’ll run out of high quality data before 2027.

The News Quiz

An AI with zero conception of the rules of Go generated this board, which would be quite absurd if it occurred in a real game.

in 2016 AlphaGo defeated the Go world champion Lee Sedol 4-1 in a 5-game show match. It was a significant achievement in ML because the game of Go is extremely complex. Fun fact: The number of legal Go board states is larger than the number of atoms in the universe.

Since that achievement other Go-playing AI’s have extended and refined AlphaGo’s core concepts. In particular one called KataGo has become a standard bearer. Earlier this year a fairly high ranked amateur Go player named Kellin Pelrine defeated KataGo 14 games to 1.

Read these two articles about the “adversarial strategy”:

Then, answer these questions:

  • Perline described his encircling strategy this way: "As a human, it would be quite easy to spot." Why then, didn’t the top ranked Go AIs spot it?

  • In terms of “extrapolation” and “interpolation” what is this adversarial AI missing that allows novice human players to defeat it, even while it defeats a top ranked AI that in turn frequently defeats top ranked humans?

Themes in the News

No one really knows the future of AI

OpenAI CEO Sam Altman has been making the rounds, charming legislators, and musing about whether his company’s creations have a 0.5% or a 50% chance of destroying humanity.

Are large language models just “stochastic parrots” as computational linguist Emily Bender et al have argued? Or is Altman right that general intelligence might be one of the, “emergent properties from doing simple things on a massive scale.”

If ML systems run out of high quality training data, as some ML researchers have speculated will most likely happen before 2027, will they already have consumed enough data to become super-intelligent and continue learning autonomously?

Is Anthropic AI right that we need to build exactly the types of systems that might destroy humanity to prevent that outcome? Or is the only way to avoid our destruction the complete abandonment of general artificial intelligence R&D?

Right now there are a lot of unknowns and a lot of disagreement even among experts.

Will anything replace Twitter or will it just be X?

Apparently Twitter is now X.

Since Elon Musk bought Twitter the site has been turbulent. Between massive layoffs, increased service outages, and advertiser abandonment, other tech firms smell blood in the water. Mastodon, Bluesky, Substack’s Notes, and now Meta’s Threads have all made attempts to capture Twitter refugees or otherwise capitalize on the chaos surrounding Musk’s takeover, though it’s not clear any of them will succeed.

Mastadon’s decentralization gives it an inherent moderation problem, which is why it’s become a haven for child sexual abuse materials. Notes and Bluesky have yet to attract massive user bases, although Bluesky reportedly crested 1 million users. Threads is probably best positioned because of their ability to basically import Instagram’s users, but Meta isn’t exactly a harbinger of warm-fuzzies.

This writer has been off Twitter since long before the Musk takeover, and I find myself increasingly sympathetic to the notion that maybe we just don’t need a new Twitter.

Teb’s Tidbits

Answers To The News Quiz

Perline described his encircling strategy this way: "As a human, it would be quite easy to spot." Why then, didn’t the top ranked Go AIs spot it?

KataGo and other top ranked AIs primarily train in stages. First they “watch” expert level Go games between top ranked humans, then two similar versions of the model engage in “self play.”

But because the strategy Perline employed is “quite easy to spot” for top ranked human players, they never use it. When the models switch to self play the bots don’t use the tactic either, because they’ve never seen it.

In terms of “extrapolation” and “interpolation” what is this adversarial AI missing that allows novice human players to defeat it, even while it defeats a top ranked AI that frequently defeats top ranked humans?

The adversarial AI only really practiced against KataGo. Not only that, it “practiced” in a peculiar way that was explicitly designed to find a specific weakness or blindspot in KataGo’s play. The only kind of Go game that could really be considered “interpolation” for this machine is exactly the style of game that KataGo plays.

So, even the basic strategies of a novice human are “extrapolation” and therefore cause the adversarial AI significant problems.

Remember…

The Lab Report is free and doesn’t even advertise. Our curricula is open source and published under a public domain license for anyone to use for any purpose. We’re also a very small team with no investors.

Help us keep providing these free services by scheduling one of our world class trainings or requesting a custom class for your team.

Reply

or to participate.