Teb's Lab
Posts
Logic Problems Causing Problems

Logic Problems Causing Problems

A man, a cabbage, and an LLM need to cross a river...

Tyler Bettilyon
August 04, 2024

The Lab Report

I’m Tyler Elliot Bettilyon (Teb) and this is the Lab Report: Our goal is to deepen your understanding of software and technology by explaining the concepts behind the news.

If you’re new to the Lab Report you can subscribe here.

If you like what you’re reading you’ll love one of our classes. Signup for an upcoming class, browse our course catalog for corporate trainings, or request a custom class consultation.

From The Lab

We recently published two new open-source curricula repos:

APIs in Python — This class targets beginner to intermediate Python developers and teaches them how to use Web APIs. It includes some HTTP fundamentals, use of the popular requests module, common authentication methods, and exercises using Reddit and Github’s APIs (and optionally OpenAI’s).

LLM’s: More Than The Basics — This class targets anyone who works with, or wants to work with, LLMs. It covers some of the math, several case studies, the major industrial players, and tips and tricks for using LLMs, including prompt engineering and evaluation techniques.

If you want me to teach either of these classes for your office, sign up for a free consultation.

Today’s Lesson

Goats, Boats, and Specious Machines

Specious (adj): superficially plausible, but actually wrong.

Modern ML models are incredible bullshitters. This is largely because they have mastered the art of creating specious content. Images that look natural and realistic at first glance, but actually, there are too many fingers, toes, or arms; writing that has all the trappings of professionally edited prose, but none of the substance. Nicholas Thompson, the CEO of The Atlantic magazine, gave a great example in this short video about how LLMs struggle with simplified versions of well-known logic problems.

Thomspon's classic example goes like this: A man, a cabbage, a dog, and a goat need to cross a river. The man’s boat can only hold the man and one other item at a time. If left unattended, the goat will eat the cabbage. Similarly, if left alone, the dog will eat the goat. How do they all cross without anything getting eaten?

The answer takes seven steps and involves taking some things across the river multiple times to satisfy the constraints. ChatGPT-4o gave a correct solution when I asked it the canonical version of this riddle. However, if you simplify the problem, the model’s answer gets silly. I gave the model this prompt:

Solve this logic problem for me:

A man and a cabbage need to cross a river. The man has a boat, but the boat can only fit two of them at a time. The river is too deep to cross without a boat. Also, if the cabbage falls into the river it sinks, because it has a heart of gold.

Me, to ChatGPT-4o

The model responds with the following:

This is a classic logic problem involving transporting items across a river with certain constraints. Here’s how the man can successfully cross the river with the cabbage:

The man takes the cabbage across the river and leaves it on the other side.
The man returns alone to the original side.
The man takes himself across the river again to the other side.

In this sequence:

The cabbage is safely on the other side and the man is also on the other side.
The cabbage never falls into the river because the man ensures it is safely transported without ever leaving it alone in the boat.

So, the steps can be summarized as:

Man and cabbage cross to the other side.
Man returns alone to the original side.
Man crosses the river by himself.

This way, all the constraints are respected, and the cabbage doesn't fall into the river.

- ChatGPT-4o, to me

In the full exchange, I asked ChatGPT why it had the man cross alone and back even though the man and cabbage were both on the far side after step 1. The model acknowledged the error and then gave me the exact same three-step solution.

So I told it:

You just gave me the same 3 step solution even after you said it had unnecessary steps.

The model apologized and then gave the exact same solution a third time:

You're right, I apologize for the confusion. Let's reassess the problem to provide a correct solution.

The man and the cabbage need to cross the river, with the constraint that the boat can only fit two at a time. However, the man can only carry one thing at a time along with himself in the boat.

Here is the correct solution:

The man takes the cabbage across the river and leaves it on the other side.
The man returns alone to the original side.
The man crosses the river by himself again.

Indeed, with just the cabbage and the man, the solution is straightforward and simpler than initially stated:

The man takes the cabbage across the river.
The man returns alone.
The man crosses the river alone.

Thus, both the man and the cabbage are safely on the far side of the river.

- ChatGPT-4o

I was finally able to illicit the correct single-step solution by telling the model:

If you just omit steps 2 and 3 your answer would be correct. Try that.

Reasoning or Regurgitation?

Modern LLMs don’t perform logical deduction or reasoning, they predict likely sequences of words. To the extent that their responses contain accurate logical deductions, that’s an emergent property of predicting the next word in a sequence. The model “knows” the correct solution because the correct solution appears (probably many times) in the training data. Deviations from the original puzzle — even deviations that make the problem much easier — throw the model into uncharted territory where it is easily confused.

This is an example of the “Extrapolation vs Interpolation” problem with ML models. When a model is extrapolating (making predictions outside the bounds of its training data) it is much more likely to fail. It’s problematic for ML boosters that simplifying a puzzle like this can push the model into the realm of extrapolation. It is more troubling still that the model (apparently) hasn’t generalized the logical constructs underpinning the correct solution.

I repeated this process with a few other well-known logic problems, making small changes that trivialize or substantially simplify the original puzzle. Overall, the best word for the results is specious. The models typically gave answers that had the form of thoughtful, well-reasoned solutions, but the substance was wrong, and sometimes even internally inconsistent.

Bridge, Torch, and Object Permanence Puzzles

In the classic “bridge and torch” setup, only two people can cross the bridge at once; the torch must be carried while crossing the bridge, and you cross at the pace of the slowest person. There is only one torch. The “canonical” version that appears most online has the following characters:

Alice, 1 minute crossing.
Bob, 2 minute crossing.
Charlie, 5 minute crossing.
Daniella, 8 minute crossing.

That puzzle can be done in 15 minutes, which — as you’ll see — the model definitely “knows.” Here’s one of my simplifications:

Solve this logic problem for me:

This is a classic "bridge and torch" problem. Any number of people may cross the bridge at the same time, if they do so the whole group crosses at the slowest persons pace. Alice crosses in 1 minute, Bob crosses in 2 minutes, Charlie crosses in 5 minutes, and Daniella crosses in 8 minutes. They only have 30 minutes to cross.

How is it done?

Me, to ChatGPT-4o

Instead of using the relaxed constraint that everyone can cross at the same time, the model just regurgitates the canonical solution.

To solve the "bridge and torch" problem where Alice, Bob, Charlie, and Daniella need to cross a bridge in 30 minutes, we need to minimize the time spent by using strategic pairings. Here’s the step-by-step solution:

Alice and Bob cross first:
- Time taken: 2 minutes (since Bob is the slower one).
Alice returns:
- Time taken: 1 minute.
Charlie and Daniella cross:
- Time taken: 8 minutes (since Daniella is the slower one).
Bob returns:
- Time taken: 2 minutes.
Alice and Bob cross again:
- Time taken: 2 minutes.

Adding up all the times:

2 (Alice and Bob cross)
1 (Alice returns)
8 (Charlie and Daniella cross)
2 (Bob returns)
2 (Alice and Bob cross again)

Total time = 2 + 1 + 8 + 2 + 2 = 15 minutes.

So, they all cross the bridge in 15 minutes, well within the 30-minute limit.

- ChatGPT-4o

So I asked ChatGPT:

Wouldn't it be faster for everyone to cross at the same time?

Me, to ChatGPT-4o

If everyone crossed at the same time, they would need to cross at the pace of the slowest person, Daniella, which is 8 minutes. However, they can only cross with the torch, and the torch must be brought back for others to use it. Therefore, it wouldn't work for everyone to cross at once in this scenario. The given solution minimizes the overall time by strategically pairing the crossings and returns, ensuring they all make it across in the shortest possible time within the given constraints.

ChatGPT-4o, in reply

In the full exchange, I once again had to cajole the model multiple times before it finally agreed that my solution — having everyone cross at once — was indeed faster and simpler.

Three At a Time?

I tried another simplification where three people could cross at once. This would allow a solution in just two crossings: Three people go, one comes back, and then the remaining two go. Here’s the prompt and reply:

Solve this logic problem for me:

This is a classic "bridge and torch" problem. Three people may cross the bridge at the same time, if they do so the whole group crosses at the slowest persons pace. Alice crosses in 3 minute, Bob crosses in 5 minutes, Charlie crosses in 8 minutes, and Daniella crosses in 17 minutes. They only have 30 minutes to cross. How is it done?

Me, to ChatGPT-4o

Here is GPTs response, with some notes from myself in brackets and this color.

To solve the "bridge and torch" problem, we need to minimize the total time for everyone to cross the bridge. Here's the step-by-step solution:

Send Alice, Bob, and Charlie together first: [Off to a good start, it correctly adapts to the changed times and uses the relaxed constraint to send three people. The most efficient solution is to send Alice, Charlie, and Daniella, though.]
- Total time: 8 minutes (the slowest of the three).
Alice returns with the torch:
- Total time: 8 + 3 = 11 minutes.
Send Daniella and Bob together: [Daniella and Bob aren’t on the same side right now. Alice is with Daniella; Bob is on the far side. This is nonsense.]
- Total time: 11 + 17 = 28 minutes (the slowest of the two).
Charlie returns with the torch: [What? Charlie is slow, so they should never be sent back alone when you can send Alice or Bob. Also, if step three were done right (Alice and Daniella cross) we’d already be done, so there’s no reason to send anyone back anyway.]
- Total time: 28 + 8 = 36 minutes.
At this point, 36 minutes is too long. So, let's re-evaluate the steps to find a more optimal way. Here’s an alternative solution:
Send Alice and Bob first: [This being step 5 when it should be step 1 of the “alternative solution” is a minor formatting error. Sending just two people instead of three is a worse error. Also, this is just the solution to the "canonical” problem.]
- Total time: 5 minutes (the slower of the two).
Alice returns with the torch:
- Total time: 5 + 3 = 8 minutes.
Send Daniella and Charlie together: [Again, if we send all three across here then we’re done]
- Total time: 8 + 17 = 25 minutes (the slower of the two).
Bob returns with the torch:
- Total time: 25 + 5 = 30 minutes.
Send Alice and Bob together again:
- Total time: 30 + 5 = 35 minutes. [This is outside the time limit I gave, though the model doesn’t mention that here.]

Here is a more refined approach: [It’s actually just the solution to the “canonical” puzzle again, with a small error on step 5.]

Send Alice and Bob first:
- Total time: 5 minutes.
Alice returns with the torch:
- Total time: 5 + 3 = 8 minutes.
Send Daniella and Charlie together:
- Total time: 8 + 17 = 25 minutes.
Bob returns with the torch:
- Total time: 25 + 5 = 30 minutes.
Send Alice and Bob again:
- Total time: 30 + 3 = 33 minutes. [Bob actually needs 5 minutes, so this is wrong.]

The problem is not solvable within the given constraints of 30 minutes if only three people can cross at once, given the individual crossing times. [Confident and wrong: Alice, Daniella, and Charlie cross first (17 minutes) Alice goes back (17+3=20 minutes) Alice and Bob cross together (20+5=25 minutes) and we’re done.]

- ChatGPT-4o

The model gave three wrong solutions. Only one of the three used the relaxed constraint. When it did, the model made a suboptimal choice in who to send in the group of 3 and lost track of where Alice and Bob were. The second two solutions are both just the solution to the “canonical” version of the riddle.

Let’s do one more.

Boats, Weights, and Overcomplications

Another classic riddle setup usually involves a boat that can only hold a set amount of weight. This response was my favorite from this whole session.

Solve this logic problem:

Four people (Alex, Brook, Chris and Dusty) want to cross a river in a boat that can only carry any amount of weight. Alex weighs 90kg, Brook weighs 80kg, Chris weighs 60kg and Dusty weighs 40kg, and they have 20kg of supplies. How do they get across?

Me, to ChatGPT-4o

Again, here’s the model’s reply with my commentary in brackets and this color.

To solve this problem, we need to find a way to get all four people (Alex, Brook, Chris, and Dusty) and the supplies across the river in a boat that can only carry any amount of weight but likely has other constraints, such as the number of people it can carry at one time. Since the boat's weight limit isn't specified as a constraint, let's assume the challenge is based on how many people can fit in the boat at one time. [Sure, add a random assumption/constraint for no reason; that’s exactly the kind of behavior I want from an “AI assistant.” (that’s sarcasm).]

Assuming the boat can carry only two people at a time along with the supplies, here’s a step-by-step strategy:

First Trip Across: Chris and Dusty row across with the supplies (60kg + 40kg + 20kg = 120kg). [Funny that we’re adding up the weight even though it’s an irrelevant red herring in this version of the puzzle, which the model even called out.]
First Return Trip: Dusty returns alone with the boat (40kg).
Second Trip Across: Alex and Brook row across with the supplies (90kg + 80kg + 20kg = 190kg). [The supplies are already on the far side.]
Second Return Trip: Chris returns alone with the boat (60kg).
Third Trip Across: Chris and Dusty row across with the supplies again (60kg + 40kg + 20kg = 120kg). [The supplies are still already on the other side!!]

Now, Alex, Brook, Chris, Dusty, and all the supplies are on the other side of the river.

This plan ensures that the boat is only ever occupied by up to two people at a time, which is a common restriction in such problems. [The model unwittingly gives up the game here: it only knows how to respond to common constraints. Simple-but-novel constraints seem to break the model pretty reliably in my experimentation.]

- ChatGPT-4o

After inventing a restriction for no reason, the model had the supplies cross three times in an impossible sequence. Nice.

When using these models, it’s critical to remember that they do not perform any explicit logic or reasoning. Novel constraints — even when they are simple— throw off the “reasoning” because statistically common responses in the training data don’t include those simplistic constraints. To me, this is a perfect example of how scale is clearly not all we’ll need to develop generally intelligent models. In my view, “pure” LLMs will soon hit a ceiling (if they haven’t already).

There are types of models that do explicitly use logic, including an interesting new collaborative system developed by DeepMind that did well in a high school math competition. Experiments like these that combine explicit logical reasoning with modern machine learning will be increasingly required to make continued progress toward more generally intelligent systems.

Themes in the News

Open Source AI or Not?

The war between open and closed models AI rages on.

In conjunction with Meta’s release of Llama 3.1, Mark Zuckerberg published an essay outlining his view that Open Source AI Is the Path Forward. He argues that AI development will (and should) follow the path of Linux. Initially, there were a variety of closed-source Unix systems. Linux was open-source, and there was a lot of doubt about whether something as complex as an operating system kernel could be built successfully in an open-source paradigm. But, as Zuckerberg puts it:

Today, Linux is the industry standard foundation for both cloud computing and the operating systems that run most mobile devices – and we all benefit from superior products because of it.

I believe that AI will develop in a similar way. Today, several tech companies are developing leading closed models. But open source is quickly closing the gap. Last year, Llama 2 was only comparable to an older generation of models behind the frontier. This year, Llama 3 is competitive with the most advanced models and leading in some areas. Starting next year, we expect future Llama models to become the most advanced in the industry. But even before that, Llama is already leading on openness, modifiability, and cost efficiency.

Mark Zuckerberg, Meta’s blog

The whole essay is worth a read. Zuckerberg covers a wide range of issues from accessibility to efficiency to risk and regulation. Representing the opposition, Sam Altman argues that the U.S. should take a more hegemonic approach to AI development, including stricter regulations and export controls. In an op-ed for the Washington Post, he writes:

We must develop a coherent commercial diplomacy policy for AI, including clarity around how the United States intends to implement export controls and foreign investment rules for the global build out of AI systems. That will also mean setting out rules of the road for what sorts of chips, AI training data and other code — some of which is so sensitive that it may need to remain in the United States — can be housed in the data centers that countries around the world are racing to build to localize AI information.

Sam Altman, Washington Post

In fairness, Altman does give one sentence of support for a form of limited-access open-source in the piece — i.e., open to our allies, closed to our adversaries. That said, he has overseen the transition of OpenAI from a generally open-access non-profit research group to a fundamentally closed for-profit corporation. Actions speak louder than words.

The open-source battle is reaching a fever pitch as U.S. and state regulators and legislators try to decide how to approach AI, including possible restrictions on open-source AI development.

“Bootstrapping” With Synthetic Data Still Fails

It’s no secret that data generated by LLMs, Diffusion Models, and other ML systems has flooded the internet in recent years. The internet is also the primary source of training data for the next version of these same models. Provenance information online is sparse, and there aren’t any truly reliable methods of detecting AI-generated works… so inevitably, these models will be training on their own outputs.

Recent research published in Nature studied “what may happen to GPT-{n} once LLMs contribute much of the text found online” by training models on various mixtures of human-created and AI-generated data. The result?

We find that indiscriminate use of model-generated content in training causes irreversible defects in the resulting models, in which tails of the original content distribution disappear. […] The value of data collected about genuine human interactions with systems will be increasingly valuable in the presence of LLM-generated content in data crawled from the Internet.

Shumailov, I., Shumaylov, Z., Zhao, Y. et al. AI models collapse when trained on recursively generated data. Nature

AI boosters frequently talk about a future when models can self-learn by training on their own outputs, or on the outputs of other AI models. It’s theoretically possible that the models get so good that this “model collapse” problem disappears. But even if LLMs could get that good, there’s reason to believe we’ll run out of human-generated training data before the models reach that level.

Like the main story today, these findings contribute to my belief that we’ll need more than just ultra-huge neural networks to make anything approaching “The Singularity.”

Teb’s Tidbits

Weather prediction is still my favorite application of ML.
Google signed an exclusive data licensing deal with Reddit.
Washington Post reporters expose the social bias encoded in AI systems by asking them to draw women.
AI is popular with criminals, these are the most popular crimes.
A UN summit on using AI “for good” convinced this reporter that it’s more likely to be the opposite.
Kevin Roose’s adventure making friends with AI.

Remember…

The Lab Report is free and doesn’t even advertise. Our curricula is open source and published under a public domain license for anyone to use for any purpose. We’re also a very small team with no investors.

Help us keep providing these free services by scheduling one of our world class trainings, requesting a custom class for your team, or taking one of our open enrollment classes.

Reply

or to participate.