Opening The Black Box

Actually, we know a lot about neural networks

The Lab Report

I’m Tyler Elliot Bettilyon (Teb) and this is the Lab Report: Our goal is to deepen your understanding of software and technology by explaining the concepts behind the news.

If you’re new to the Lab Report you can subscribe here.

If you like what you’re reading you’ll love one of our classes. Signup for an upcoming class, browse our course catalog for corporate trainings, or request a custom class consultation.

From The Lab

Note: We’re publishing a week early because next Sunday is Easter.

Welcome Newcomers: We added more than 1,000 new subscribers this month, a record for us. We hope you find The Lab Report valuable and stick around for many editions to come. Plus, check out our archive for great stories you missed!

And if you, dear reader, shared our newsletter with someone lately: Thank you very much.

Exciting news: We’ve released three classes for open enrollment. For the first time, you can take a class from Teb’s Lab without a corporate sponsor. We are currently offering three courses in April:

As a Lab Report subscriber, you can save 10% with the following discount code:

REPORT-READER 

How are we doing? If you have any feedback about The Lab Report, respond to this email! We love hearing from you and we read every single response.

Today’s Lesson

The code for today’s lesson includes making a couple simple charts and a script that uses OpenAI’s API. Both can be found on Github.

Peeking Inside the “Black Boxes”

Lately, I’ve seen several articles with titles such as “Large language models can do jaw-dropping things. But nobody knows exactly why.

This irks me because the intentional mystification of these tools positions AI experts and firms as a kind of Wizard of Oz, playing with powers beyond the comprehension of mere mortals. It confuses and stupifies the public. Then, Sam Altman calls from behind the curtain, give me $7 trillion and I can save the world.

I’m glad people are working on AI safety. I’m happy people are thinking through worst-case scenarios. But the “AI” Musk referenced makes pictures. It’s not a hop, skip, and a jump away from murder.

Moreover, the mechanism for Gemini’s allegedly dangerous wokeness — which caused the generation of racially diverse Nazi images — is actually simple and benign. Google engineers asked Gemini to detect if a query for an image was about a human; if it was they had Gemini re-write the prompts to include randomized demographic information.

I wrote a short script to do the same thing using OpenAI’s API (code available here). It probably has some bugs, but it’s 80 lines of mostly boilerplate code and took me less than an hour to write and test. Here’s what my script generates for the prompts: “Buddhist monk,” “A sailor on a boat,” “the pope,” “several cats,” “a handsome dog,” and “a penguin.”

Images generated by DALL-E 3 using the demographic-expanding script I wrote.

Gemini didn’t decide to be woke, nor did it determine the mechanism of its wokeness. Google engineers didn’t even retrain the underlying model, they just strapped a simple filter-and-transform operation on top of the public interface. That’s a far cry from the paperclip problem Musk is pearl-clutching about.

The “black box” talk is razzle-dazzle that mainly serves entrenched interests. It fuels the wildest hyperbolic speculation about AI’s capabilities, consciousness, and future potential. It gives both the doomers and the over-the-top hype men unnecessary ammunition for their existential fantasies — utopian, dystopian, and otherwise.

It’s true that the fundamental theories backing neural networks lag behind our engineering capabilities. But we still know quite a lot about how and why these models work, even if we can’t always explain the exact reasoning behind each individual prediction. Researchers have peered into the black boxes and published many fascinating results.

In today’s lesson, we’ll explore a non-exhaustive list of things we do know about how and why neural networks — a class that includes LLMs like ChatGPT and image generators like Stable Diffusion — work so well.

The Fundamentals

Modern image generators and chatbots are further evidence of The Unreasonable Effectiveness of Mathematics and its incredible capacity to model and explain all kinds of phenomena. To understand why, we have to go back to basics.

Formally, neural networks are mathematical models designed to solve something called “optimization problems.” Like many, I first encountered this type of problem in a calculus class. It looked something like this:

You are a farmer who needs to fence in 200 square feet of land for chickens. Assuming the fence must be in the shape of a rectangle, what is the minimum amount of fence you can buy to build this fenced area?

Optimization problems always ask about minimizing or maximizing some value given some constraints. In this case, we’re “optimizing” for the amount of fencing by minimizing it.

In calculus, we’re taught to form an equation, take its derivative, set that derivative to 0, and then solve for our variable (in this case, fence length). That solution will tell us the “critical points” and one of those critical points will always be the minimum or maximum we’re looking for, provided a min or max exists. Here’s a step-by-step solution to the fence problem:

L and W are the length and width of our rectangles. Here’s equations for the area and minimum perimiter (which we want to know, denoted as ???)

200 = L * W
??? = 2L + 2W

We’re doing single variable calculus, so solve for W in the area equation and substitute it in the perimeter equation:

W = (200 / L)
??? = 2L + 2(200 / L)

Then simplify the equation, take the derivative, set it to zero, and solve for L:

??? = 2L + 400/L

???' = 2 - 400/L2 

0 = 2 - 400/L2 
400/L2 = 2
400 = 2L2
200 = L2 
L = sqrt(200) ~= ±14.14 

Negative fence length doesn't make sense so we assume positive 14.14 is the proper value for L. Plug that into our perimeter equation and we find out how much fencing we need: 

2*14.14 + 2*(200 / 14.14) ~= 56.56
2*14.14 by 2*(200 / 14.14) => 28.28 by 28.28

Turns out a square is the optimial shape, and we need 56.56 feet of fencing.

Gut check: We can also plot the perimeter function and look for the local minima:

It’s hard to see in the zoomed-out view, but a critical point is at ~14.14 as seen on the zoomed-in view.

Calculus is amazing. This strategy always works for problems we can define as differentiable functions. We don’t have time to teach you exactly why in today’s lesson, but Khan Academy’s differential statistics class is stellar if you don’t fully understand what we just did.

The key point for today is that calculus can always be used to find the critical points — and thus any maximums and minimums — of a differentiable function. This fact is the foundation of all neural network research and development.

Neural Networks are Applied Calculus

It turns out neural networks are — very literally — differentiable* math functions. The “architecture” of a neural network refers to the type and arrangement of its various mathematical sub-components. The “parameters” of a neural network are variables in that math function which — just like our value of L in the fence example — are “learned” by applying calculus techniques that are similar to the minimization we performed above.

*Don’t @ me with your quibbles about non-differentiable activation functions like ReLU. We take the derivative piecewise.

There are three twists:

First, taking the gradient* and solving it directly is computationally unfeasible for huge formulas with many dimensions, which modern neural networks are and have. Instead, we use iterative methods like gradient descent to find just one approximate critical point, not all critical points precisely.

*The derivative is called the gradient when we have more than one variable.

Second, instead of some ground truth researchers pick something called a loss function which compares the output of our model to the labeled training data. This loss function will return 0 when the model’s output and training data labels match perfectly. The loss will grow in magnitude (positive or negative) as the model’s outputs stray from the training data labels.

This second twist is a legitimate bit of alchemy. In the fence example, we had a set of equations rooted in geometric fact. We know the math functions that map a rectangle’s sides to its area. No one knows the math that maps English to German, or if such a function really even exists. Instead, models use a loss function with useful properties — including differentiability — to compare the model’s outputs to outputs we know are correct. We apply the calculus iteratively, on a sample-by-sample basis, to minimize the formula created by feeding the neural network’s final output to the loss function. We “minimize the loss.”

This means we’re minimizing some proxy for correctness, at best. In a capital-T Truth sense, it’s not at all clear that there is a math function that should theoretically do what ChatGPT or Stable Diffusion can do. And yet, their existence is proof that a math function that can do those things does exist. ChatGPT and Stable Diffusion are those math functions (with some non-trivial infrastructure and application engineering strapped on top).

Third and finally, we always choose math functions for our neural networks that satisfy something called the “universal function approximation theorem.” This means that, for any given neural network, we can approximate any math function that could ever exist just by changing the values of the parameters. This infinite flexibility is what allows neural networks to perform so well in so many different domains.

Taken all together, the fundamental premise of modern machine learning is: If some math function can reasonably map our input data to our output data, then we can train a neural network to discover that math function (or a very close approximation).

After that, it’s more or less a game of getting mountains of data and representing it in a numeric format that retains most of its informational value. Embeddings are popular for language data. Pixel color and intensity values are popular for image data. New tactics will continue to be invented.

By building these neural networks, we're discovering that absurdly complex math functions are ridiculously flexible and can model all sorts of natural and fabricated phenomena, including language translation, question-answering, image generation, and much more.

Perhaps the biggest lesson of the last decade of neural network research is that mathematics is wildly more capable than many dared to believe.

But it still isn’t magic.

What We Don’t Know… And What We’ve Learned

What we don’t know, broadly speaking, is how to introspect on a model’s decisions once it’s been trained. We know the function our model has learned successfully maps inputs to outputs, and we understand how and why the training process works to produce such a function, but we usually don’t know precisely how and why the learned function itself works.

Part of this is due to the scale: The learned functions are unbelievably large and convoluted. GPT-4 has roughly 1.7 trillion parameters. It’s not realistic to manually examine the parameters and come to a conclusion about each one’s impact on the output. It’s even less realistic to fully comprehend how the parameters interact to form complex patterns and how those patterns interact with the data being fed to the model.

But that doesn’t mean researchers aren’t trying.

Here are just a few of the incredible things we’ve learned about how and why neural networks work.

Convolutional Filters Detect Features

Convolutional layers are a key component of neural networks that work with visual data such as images and video.

The learned component of a convolutional layer is called a “kernel.” Researchers have learned to visualize the outputs of these kernels and have demonstrated that individual kernels learn to perform different kinds of feature extraction, such as detecting edges, shapes, and even higher-level features like the locations of eyeballs or fur.

Tools like CNN-Explainer can perform these visualizations and help practitioners understand what their neural networks are “seeing.”

Recurrent Neurons Have Semantic Meaning

RNNs have largely been replaced by more computationally efficient Transformer architectures. However, before their abandonment, researchers demonstrated that individual neuron activations often mapped cleanly to high-level features of text, such as position relative to the start/end of a line, being inside quotations, a line of code being inside an if statement, and more!

Attention Layers Find Grammatical Patterns

Attention layers have mostly replaced recurrent layers. Attention layers learn to associate words in a piece of text with each other. Introspecting on these layers often uncovers intuitive patterns — like nouns being mapped to their pronouns or adjectives being mapped to the noun they’re describing.

There’s still some disagreement about just how interpretable these weights are, perhaps best exemplified by these two dueling papers.

The Latent Spaces of Image Generators are Partially Interpretable

Generative Adversarial Networks (GANs) — one of the popular technologies for image generation — use a large vector of randomized numbers as part of the input to the image generator process. Researchers have discovered that many cells in the vector can be cleanly mapped to high-level concepts about the image.

For example, in a GAN trained on human faces, individual cells in the latent space have been mapped to features of the generated face including their hair color, eye color, and even whether or not they’re wearing glasses. Labels and embeddings have also been used to intentionally “condition” the latent space, letting researchers intentionally give meaning to those values.

Other researchers used additional ML techniques to automatically identify which cells in the latent space have meaningful semantic values without intentional conditioning.

Similar research into the latent space of diffusion models (currently the most popular technique for image generation) is also being done.

The Bottom Line

We don’t know everything about how and why state-of-the-art neural networks behave the way they do, but we know a lot more than nothing. Next time someone tells you, “We don’t even know how [new hot model] works!” I challenge you to do two things:

1) Gently push back on that narrative. We know plenty about these models, especially the fundamental calculus and statistics on which they’re based.

2) Encourage others to be curious rather than fearful or awestruck. We don’t know how these models work yet — but our ignorance won’t last forever.

Themes in the News

The Latest TikTok Ban Attempt

A bill that would require ByteDance to sell TikTok to an American company or see TikTok banned has passed unanimously (50-0) in its House committee, then passed 352-65 on the House floor. The Senate is currently evaluating the bill.

If it passes, ByteDance will have six months to sell TikTok. If they do not, the mechanism for the ban will be civil penalties enforced against app stores that continue to host or update TikTok.

The stated reasoning for the bill is that TikTok is both spyware and a propaganda machine for China, a “foreign adversary.” Lawmakers have not been forthcoming with the evidence that led to overwhelming and bipartisan votes in the House, but TikTok has admitted to surveilling Americans (and specifically journalists) using the app in the past.

The bill has been sent to committee in the Senate, which Washinton insiders claim is often a way for senate leaders to pump the breaks on a piece of legislation.

Personally, I think TikTok is absolutely spyware and undoubtedly a vector for propaganda… But I think the same is true of many American social media apps. Chinese ownership and being subject to the CCP’s authority are relevant differences between TikTok and Facebook. Still, I’d rather see comprehensive privacy legislation addressing the widespread corporate surveillance we face every day.

AI’s Copyright Issues Continue to Evolve

We wrote a comprehensive edition on ML and copyright last month, and there have been some relevant developments!

NVIDIA has now been hit with a class action copyright lawsuit, once again by authors whose books appear in a popular training dataset called “The Pile.” This lawsuit is similar to others previously filed against Microsoft, OpenAI, and others.

One such lawsuit, another class action led by comedian Sarah Silverman, had some of its claims dismissed. This dismissal is “without prejudice,” which means the plaintiff’s attorneys may address issues raised by the court and re-file them. For example, the judge’s ruling explains that the authors failed to cite any outputs “substantially similar — or similar at all — to their books.”

We know models do sometimes spit out identical and near-identical copies of training data because other lawsuits, such as The New York Times vs OpenAI and Getty vs Stability AI, both include examples of such in their own legal filings.

I spoke with a copyright attorney who told me these types of mistakes are common in class action lawsuits. The class action attorneys know a lot about class action laws, but less about intellectual property laws. Most likely, the class action lawyers will now hire some copyright lawyers to help them fix the issues raised by the judge.

I particularly enjoyed these two opinion pieces on the matter.

“A blanket ruling about AI training is unlikely. Instead of saying “AI training is fair use,” judges might decide that it’s fair to train certain AI products but not others, depending on what features a product has or how often it quotes from its training data. We could also end up with different rules for commercial and noncommercial AI systems. Grimmelmann told me that judges might even consider tangential factors, such as whether a defendant has been developing its AI products responsibly or recklessly. In any case, judges face difficult decisions. As Bibas admitted, “Deciding whether the public’s interest is better served by protecting a creator or a copier is perilous, and an uncomfortable position for a court.””

Source: The Atlantic

And, in Ars Technica, Timothy B. Lee and James Grimmelmann (who was quoted in The Atlantic piece) look at three historic IP lawsuits: One that destroyed MP3.com, one that resulted in significant fines for Texaco, and one that didn’t hurt Google much at all. Through the lens of those cases, they conclude that “The AI community needs to take copyright lawsuits seriously.

Meanwhile, a Chinese court has already fined a Chinese ML company for copyright infringement, though the fine was quite small — roughly $1,400.

Teb’s Tidbits

Remember…

The Lab Report is free and doesn’t even advertise. Our curricula is open source and published under a public domain license for anyone to use for any purpose. We’re also a very small team with no investors.

Help us keep providing these free services by scheduling one of our world class trainings, requesting a custom class for your team, or taking one of our open enrollment classes.

Join the conversation

or to participate.