Teb's Lab
Posts
WTF is ML Anyway?

WTF is ML Anyway?

News: Weather Prediction and Surveillance Regulation

Tyler Bettilyon
July 14, 2023

The Weekly Lab Report

I’m Tyler Elliot Bettilyon (Teb) and this is the Lab Report: cut through the noise with our weekly rundown of software and technology news.

If you’re new to the Lab Report you can subscribe here. If you like what you’re reading you’ll love one of our classes. Schedule a training from our catalog or request a custom class consultation.

From The Lab

Learn more about machine learning with our 4-day introduction to the topic. You’ll learn how to build and train machine learning models using popular libraries such as Scikit Learn and Tensorflow; how to manage and manipulate datasets using Pandas; and how to manage and mitigate common sources of error and failure in ML systems.

As always: we want to hear from you. Do you have any questions about today’s lesson? Is there a topic you want to see covered in a future edition? Is there a piece of software news you think we missed? Reply to this email and let me know!

Today’s Lesson

FYI, the code used to train models and make charts in today’s lesson is available here.

AI Series 2: What is Machine Learning?

If you missed it we covered some definitions and terminology last week.

Machine Learning (ML) is a subset of Artificial Intelligence (AI). Here’s a Venn Diagram:

All ML is AI, but not all AI is ML.

ML is the most popular branch of AI today. ML models have consistently been the state of the art in a wide array of AI tasks since the 2010s, and that trend seems likely to continue into the next decade and beyond.

Today we’ll describe what distinguishes ML from other types of AI and why it’s so successful.

“Classical” AI

In classical AI human engineers design all the rules for turning inputs to outputs. Human engineers decide how to represent the data, world, or task being performed; they decide how to process that representation; they decide what form the output will take; and so on. End to end, humans design all of it.

These designs can be complex. The “rules” are frequently described in the form of one-or-more complex algorithms that utilize clever data structures. But humans explicitly wrote those algorithms and designed the data models.

In classical AI humans design all the rules that control input to output mapping.

Google Maps’ path finding feature is a classical AI. Engineers at Google explicitly mapped the world’s roadways into a data structure called a graph. Then they use graph search algorithms to find the shortest path between two nodes in the graph.

Other examples of classical AI include:

Constraint satisfaction algorithms, which are good at scheduling problems, puzzles like Sudoku and crossword, and shipping/freight optimization.
Search algorithms which power Google Maps, spellcheck, social media friend suggestions, and some game playing systems including the famous Deep Blue chess playing system.
Simulation based methods, which are popular in drug discovery and weather prediction.
And more…

Machine Learning

Machine learning is different. With ML certain aspects of the modeling process are “learned” from data in a process called training or fitting (these terms are used interchangeably). Human engineers still decide how to represent the input data and explicitly design the training process. However, during the training process several aspects of how to map inputs to outputs are determined based on the training data.

In ML the training process produces the model, which controls input to output mapping.

For the most popular kind of ML, which is called supervised learning^*, the training data must be labeled which means each data-point contains both the input value and the correct output value.

*In future editions we’ll describe other types of ML in more detail, including unsupervised learning and reinforcement learning.

For ML designed to price houses you need examples of houses and their sale value. For ML designed to detect spam emails you need examples of emails and whether or not they are spam. For ML designed for facial recognition you need pictures of people labeled with who is in the picture.

Each model type has it’s own training process for how to use this labeled data.

Some models are optimization based, which generally means using calculus to minimize an “error” or “loss” function. Some are “distance” based, which generally means representing the data as vectors and using distances measures like euclidean or cosine distance to measure similarities or make groups. Other models use metrics like the Gini coefficient to repeatedly split the data into increasingly homogeneous groups.

You can think of an “untrained” ML model or agent as a template: some of the important decisions and structure are set by the template, but the details must be filled in during training. The “shape” of the inputs and outputs are almost always fixed: for example a house pricing system takes in a fixed set of data points about any given house (e.g. size, number of bedrooms, and year built) and returns a single number representing the price.

Some ML models make quite a lot of assumptions while others are more flexible. For example, perhaps the simplest ML model is one you may have already encountered in a high school or college statistics class: linear regression.

Linear regression makes a big assumption: there is a linear relationship between the input and output data. There are an infinite number of lines with different slope and y-intercept values. We can use training data and an optimization algorithm (e.g. gradient descent) to determine which line is the best line by iteratively trying several slope/y-intercept combinations, measuring the error by comparing our model to the labels in our training data, and making adjustments to the slope/y-intercept.

Linear regression succeeds when the training data (blue dots) have a linear trend by learning the red line as our model for how to map input values (x axis) to output values (y axis)

BUT! The model can only be a line. If your data has some other kind of relationship linear regression will still just produce a line and, as a result, it will probably be quite bad at modeling your data.

Linear regression utterly fails when the trend in our data is parabolic.

Neural networks are a much more flexible family of ML models. These models are “universal function approximators” which means they can represent any math function whatsoever. The only assumption they make about how to map input data to output data is that the mapping must be a math function.

When we use a neural network our hypothesis is that there is some math function that can map the inputs to the labels. It could be any function. The process of training a neural network is essentially an attempt to find the best math function for mapping our inputs to our labels.

A neural network can learn to approximate the parabolic function:

A simple neural network’s approximation of a parabola.

It can also approximate the sine function, even with some random noise added to the training data:

A simple neural network’s approximation of a noisy sine function.

It can even learn this weird function I pulled out of thin air for the purpose of this demonstration:

A simple neural network approximating the function:
(x^3) + 100000sin(x) - 3000x - x^2

Notice there is one area at about x=75 where the model is a worse approximation. It still follows the overall trend, but fails to capture the nuance of about 3 oscillations of the sine component.

This could be addressed through more training or by adding a bit more complexity to the underlying neural network^*. But it demonstrates one way these models fail — by not capturing every aspect of the underlying problem during training — so I left it in.

*In future editions we’ll explore “neural network architecture” and what it means to have a “complex” vs “simple” neural network.

This is all nifty, but it doesn’t explain why ML methods are so popular. Plus, the computational costs of these training processes are substantial, especially for the large models redefining the state of the art: OpenAI CEO Sam Altman estimated the cost of training GPT-4 — just the price of performing the computation — was $100 million.

So why is this cost one worth paying?

The Benefits of ML

Here’s a motivating example I use with my students: If I asked you to describe what makes a picture of a cat cat-like, what would you say? BUT! You have to do it in a language that computers understand, i.e. “mathematics.”

Photo by Amber Kip (Unsplash)

One answer I often get is, more or less, we should look for pointy ears and whiskers. Okay, so in the language of geometry, what is a “pointy ear?”

Jokes aside this approach isn’t totally impossible, but it is hard. First we’d do something called “edge detection” which can reduce the photo to lines:

The result of Canny edge detection (using OpenCV) on the cat photo.

As you can see edge detection can be a messy business. Shadows in the cats chest fur have resulted in a lot of noise. We don’t know which edges belong to the cat and which belong to something else in the image. The cat’s edges aren’t entirely continuous, including around it’s left ear…

Now, using this messy representation, we have to perform some non-trivial geometry to describe what exactly constitutes “pointy ears” or “whiskers” plus some tricky logic to find such shapes within the detected edges. Even then, plenty of other animals have pointy ears and/or whiskers.

Another tactic: we could examine the distribution of colors in the photo. If there’s a lot of pink and green then maybe its not a cat, if there’s more black, white, orange, and grey maybe it is a cat. Unfortunately, plenty of other animals share colors with cats. Some of those even have pointy ears and whiskers. Or, maybe the cat is far away so it’s colors only contribute a little to the overall color composition.

These classical approaches are an enormous challenge with some serious flaws. With machine learning we don’t do any of that. Instead we build a complex template that:

Expects pixel data as input.
Is a binary classifier (i.e. produces “cat” or “not a cat” as output).
And is sufficiently complex that it can capture important features of what makes an image cat-like or not.

Then — instead of defining features that make a picture cat-like — we collect a bunch of pictures and label them cat or not-cat. We train the model on these labeled pictures and it figures out features matter during the training process.

And, just to be clear, modern ML models are extraordinarily good at this problem which we’d call “Image Classification” in AI circles. Here’s a bunch of relevant papers.

ML also makes it much easier to expand our classifier to recognize other things. Instead of trying to describe cars, ladybugs, and hamburgers in the language of mathematics we just collect and label images of those things. Which, of course, is easy thanks to the internet.

This is what Richard Sutton called “The Bitter Lesson” in 2019:

The biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large margin. The ultimate reason for this is Moore's law, or rather its generalization of continued exponentially falling cost per unit of computation. Most AI research has been conducted as if the computation available to the agent were constant (in which case leveraging human knowledge would be one of the only ways to improve performance) but, over a slightly longer time than a typical research project, massively more computation inevitably becomes available. Seeking an improvement that makes a difference in the shorter term, researchers seek to leverage their human knowledge of the domain, but the only thing that matters in the long run is the leveraging of computation.

Richard Sutton

Because neural networks are universal function approximators and math is unreasonably effective at modeling so many things we can simply unleash the worlds remarkable and ever-growing computational power on most problems.

The dirty secret of the current ML revolution is that, while there have been some clever software breakthroughs, it’s mostly due to hardware advances and computational availability.

One final note: some classical methods also scale with computational power, in particular search based methods:

One thing that should be learned from the bitter lesson is the great power of general purpose methods, of methods that continue to scale with increased computation even as the available computation becomes very great. The two methods that seem to scale arbitrarily in this way are search and learning.

Richard Sutton

This explains why some of the most incredible state of the art systems, such as AlphaGo, employ both search and machine learning.

BUT! ML can’t solve everything (at least not yet). In the next issue we’ll explore the strengths and weaknesses of modern ML, using some prominent successes and failures as motivating examples.

The News Quiz

Every week we challenge ourselves to tie the lesson to the news. Answers are at the end of this newsletter.

A wind prediction figure from NVIDIA’s FourCastNet paper.

Recently, ML researchers have designed models that can outperform classical “numerical weather prediction” (NWS) based systems. Read this short summary about weather predictions ML moment, then answer the following questions:

Rate the following statements as true, false, or it’s complicated.

NWS methods are essentially large physics simulations.
ML methods require less computational power overall than NWS methods.
ML methods are especially good at predicting weather patterns that did not appear in the training data.

Themes in the News

Successful ML Weather Models are a Big Deal

We highlighted these in the news quiz, but it’s worth saying a bit more. Weather prediction has long been an area where ML models ought to thrive: Weather is a genuinely objective phenomena, governed by laws of physics, with incredible (and growing) amounts of historic data available.

But, until recently, ML based methods for weather prediction were considered by many meteorologists to be a “toy.” New systems such as FourCastNet, NowcastNet, and Pangu-Weather are changing that.

Once trained, these ML systems make predictions much faster than traditional simulation-based methods — in some cases 10,000 times faster. Quick updates are important in extreme weather scenarios such as the heavy rains that battered North India and New England this week.

Equally exciting is Pangu-Weather’s ability to track tropical storms without explicitly training on that phenomena. Tropical storms are an emergent property of other more fundamental weather patterns. By predicting these fundamental patterns Pangu-Weather was able to predict weather phenomena that weren’t included in the training data. This generalization to unseen patterns is an area where ML models usually struggle. Such generalization is especially important in an era of a changing climate.

Such advances are an example of how former Google CEO Eric Schmidt thinks AI will transform science.

Privacy, Surveillance, and Regulation

The EU and US reached a deal regarding US spy agencies and their access to EU citizens’ data. A bill was proposed that would prevent those same US agencies from circumventing the warrant process by buying certain kinds of data from brokers, cellphone providers, and others.

Self driving cars provide significant amounts of surveillance footage to police agencies, just one more reason that some regulators are taking a hard look at the industry.

All this data is increasingly used in Real Time Crime Centers, which attempt to make the police more responsive or more predictive, but can also exacerbate existing police biases and overreach.

Teb’s Tidbits

China is responding in kind to semiconductor export restrictions.
Google and Meta will stop showing Canadian news in protest of a law that requires them to pay Canadian news outlets.
The UK is trying to attract American crypto refugees.
Syracuse might be a big winner in the US’s attempt to bring microchip production back onshore.

Answers To The News Quiz

Rate the following statements as true, false, or it’s complicated.

NWS methods are essentially large physics simulations.
- True.
ML methods require less computational power overall than NWS methods.
- It’s complicated. ML models are much faster at making predictions once they are trained. Unfortunately the authors of the respective papers have not published details about the training costs for these ML models. If the models don’t need to be retrained frequently then this statement would likely be true. The more often they need to be retrained the more likely the statement is to be false.
ML methods are especially good at predicting weather patterns that did not appear in the training data.
- False. This is the area that ML models typically perform the worst (more on that in the next Lab Report), but this is also what makes the recent advances so exciting, here’s two quotes:
- “AI-powered forecasting models are trained on historical weather data that goes back decades, which means they are great at predicting events that are similar to the weather of the past. That’s a problem in an era of increasingly unpredictable conditions.” (From the previous summary).
- “Pangu-Weather was also able to accurately track the path of a tropical cyclone, despite not having been trained with data on tropical cyclones. This finding shows that machine-learning models are able to pick up on the physical processes of weather and generalize them to situations they haven’t seen before.” (From this deeper dive).

Remember…

The Lab Report is free and doesn’t even advertise. Our curricula is open source and published under a public domain license for anyone to use for any purpose. We’re also a very small team with no investors.

Help us keep providing these free services by scheduling one of our world class trainings or requesting a custom class for your team.

Reply

or to participate.