• Teb's Lab
  • Posts
  • AI Assisted AI Prompt Engineering

AI Assisted AI Prompt Engineering

The fundamental theorem strikes again

The Workbench

I’m Tyler Elliot Bettilyon (Teb) and this is The Workbench: Our practical, hands-on edition. Our goal is to demonstrate concepts and ideas we cover in The Lab Report — our monthly deep-dive.

If you’re new to The Lab Report you can subscribe here.

If you like what you’re reading you’ll love one of our classes. Signup for an upcoming class, browse our course catalog for corporate trainings, or request a custom class consultation.

Introducing The Workbench

Welcome to the latest addition to The Lab Report, which we’re calling “The Workbench.” In these editions, we will demonstrate emerging concepts in computing technology in a hands-on, code-driven style.

We hope these hands-on guides complement the higher-level, big-picture coverage we typically provide in our monthly Lab Report. We also hope they help you expand your skills as a developer!

We’ll be publishing Workbench editions periodically from now on, guided by our other work developing new courseware and researching topics for The Lab Report.

The Latest Level of Indirection

Years ago, Butler Lampson attributed the following quote to David J. Wheeler in a lecture titled “Principles for Computer System Design.

Any problem in computer science can be solved with another level of indirection. 

In this context, indirection refers to adding a layer of software that allows people to totally ignore another, uglier layer of software while still relying on that underlying layer. This is sometimes jokingly referred to as “The fundamental theorem of software engineering” because of how often it turns out to be true.

Writing code in binary was tedious, so Kathleen Booth added a level of indirection by inventing the assembler. Writing assembly was also tedious, so Grace Hopper added another level of indirection by implementing the first compiler. And so on.

Nowadays, a simple Python program is dozens of levels of indirection away from the physical reality on the CPU — so distant that most programmers don’t even think about electrons whirring about through the chip's microscopic transistors and silicon channels.

IEEE Spectrum wrote about the latest example of adding a level of indirection in a piece titled “AI Prompt Engineering Is Dead, Long Live AI Prompt Engineering,”

Battle and Gollapudi decided to systematically test how different prompt-engineering strategies impact an LLM’s ability to solve grade-school math questions. They tested three different open-source language models with 60 different prompt combinations each. What they found was a surprising lack of consistency. Even chain-of-thought prompting sometimes helped and other times hurt performance. “The only real trend may be no trend,” they write. “What’s best for any given model, dataset, and prompting strategy is likely to be specific to the particular combination at hand.”

There is an alternative to the trial-and-error-style prompt engineering that yielded such inconsistent results: Ask the language model to devise its own optimal prompt. Recently, new tools have been developed to automate this process. Given a few examples and a quantitative success metric, these tools will iteratively find the optimal phrase to feed into the LLM. Battle and his collaborators found that in almost every case, this automatically generated prompt did better than the best prompt found through trial-and-error. And, the process was much faster, a couple of hours rather than several days of searching.

In other words: Screw prompt engineering, that’s tedious. Let’s make the LLM do that!

So, in today’s workshop, I’m sharing a simple Python script that performs one example of automated prompt engineering: Using GPT-4 to generate better prompts for making images with DALL-E-3.

AI Based AI Prompt Engineering

My script takes four inputs: a subject, a setting, an art style, and emotional content. It produces a short series of interim prompts that ultimately result in a final prompt being sent to DALL-E 3 for an image. Here’s an example.

The script prompts the user for a subject, setting, style, and emotion to convey.

Subject: A small pitbull 
Setting: stalking a squirrel in the park
Style: Kawaii anime
Emotion: playful, joyful 

Using a few interim prompts based on those four inputs, GPT-4 produces a prompt for DALL-E 3 to use:

Design an image in the Kawaii anime style that depicts an adorable, brindle-coated pitbull puppy playing in a vibrant park. The park should be lit by ambient sunlight, streaming through the leaves of towering oak trees and casting warm light over the playground equipment. The pitbull should be illustrated with exaggerated features: wide, sparkling ebony eyes, a stout frame packed with muscles, and a merry stump tail. He should be depicted in the act of gleefully chasing a bushy-tailed, agile squirrel, bounding over lush grass with joy and playful intensity. The chase should reach a climax with the squirrel narrowly escaping up a tree, leaving the puppy prancing in childish pride, his wide tongue lolling out in a panting grin. The entire scene should exude elements of cuteness, charm, whimsy, and pure joyfulness. Use pastel hues of greens, browns, blacks and white for this endearing, lively scene.

Which results in the following image:

One More Example

Subject: A herd of elephants
Setting: The African savanna
Style: Digital illustration, flat, dusty-feeling
Emotion: Resilience

Create a flat, digital illustration in a dusty-feeling color palette of a resilient herd of elephants traversing the expansive plains of the African savanna. The elephants are depicted as large, weathered stone statues moving rhythmically, their rough, creased hides are in hues of gunmetal grey and worn leather. Elongated, arching trunks should be prominent, symbolizing their lifeline and resilience, and the oversized ears fluttering in the breeze, depicting adaptability. The eyes of the elephants reflect the burnt umber of setting suns, deep cobalt of storms, and grey of morning mist, embodying a hint of optimism amidst uncertainties. The ivory tusks must bear the marks of numerous battles, symbolizing tenacity over adversity. Use simplified shapes, sharp lines and high contrast between colors, while the texture of the elephants' skin and gradients would add complexity and interest in the otherwise flat aesthetic.

Looks great, except for that second trunk coming out of the elephant’s ear…

How It Works

First, you need to set up an account and buy some credits on OpenAI’s platform. You must also generate an API key and store it as an environment variable. OpenAI’s Quickstart Guide covers these steps quite well.

After you’ve done that, you can run the script. With the default settings, generating an image costs about 5 cents, but switching to the cheapest models brings the price under a cent (note: this significantly decreased image quality in my tests).

First, my code uses Python’s simplistic input function to grab some text from the user:

from openai import OpenAI

# Your API key must be saved in an env variable for this to work.
client = OpenAI()


# Get a prompt, embed it into a classification request to GPT
image_subject = input("Subject: ")
image_setting = input("Setting: ")
image_style = input("Style: ")
image_emotion = input("Emotion: ")

We use format strings to prepare custom prompts to GPT-4. This script uses a system prompt to tell GPT-4 to be deeply sensitive while describing emotions. System prompts can significantly change the model’s behavior, but the model doesn’t respond to them directly, nor are they required to use the API.



image_emotion_prompt = f'''
Create a 100 word summary of the following emotion.

{image_emotion}
'''

emotion_response = client.chat.completions.create(
    model="gpt-4",
    messages=[
    {
        "role": "system",
        "content": "You are deeply sensitive and in touch with your feelings. Your goal is to help others deeply understand emotions." 
    },
    {
        "role": "user",
        "content": image_emotion_prompt
    }],
    temperature=1,
    max_tokens=200,
    top_p=1,
    frequency_penalty=0,
    presence_penalty=0
)
emotion_details = emotion_response.choices[0].message.content

We use the emotion response, as well as the subject and setting, to craft a new prompt asking GPT-4 to write a short story about our subject and setting that conveys the detailed emotion:

image_subject_prompt = f'''
Create a detailed physical description of the following subject and setting.

Subject:

{image_subject}

Setting:

{image_setting}

Generate details that evoke the following emotional content:

{emotion_details}
'''

subject_response = client.chat.completions.create(
    model="gpt-4",
    messages=[
    {
        "role": "system",
        "content": "You are a keen observer of all things. You notice and care about even the smallest details." 
    },
    {
        "role": "user",
        "content": image_subject_prompt
    }],
    temperature=1,
    max_tokens=2048,
    top_p=1,
    frequency_penalty=0,
    presence_penalty=0
)

We ask GPT-4 to describe the art style we asked for in more detail, similar to what we did for emotional content:

image_style_prompt = f'''
Create a 100 word summary of the following artistic style. Focus exclusively on the visual components of the style:

{image_style}
'''

style_response = client.chat.completions.create(
    model="gpt-4",
    messages=[
    {
        "role": "system",
        "content": "You are an art historian. Describe artistic styles in detail." 
    },
    {
        "role": "user",
        "content": image_style_prompt
    }],
    temperature=1,
    max_tokens=300,
    top_p=1,
    frequency_penalty=0,
    presence_penalty=0
)
style_details = style_response.choices[0].message.content

Then, we combine the generated story with the art style for a final prompt request:

request_for_image_prompt = f'''
Image content: 

{subject_details}

Image Style:

{style_details}
'''

image_prompt_response = client.chat.completions.create(
    model="gpt-4",
    messages=[
    {
        "role": "system",
        "content": "You are a prompt engineer. Return a prompt that will help DALL-E make a beautiful image. Include generous details about the subject, setting, and style in your prompt." 
    },
    {
        "role": "user",
        "content": request_for_image_prompt
    }],
    temperature=1,
    max_tokens=2048,
    top_p=1,
    frequency_penalty=0,
    presence_penalty=0
)
generated_image_prompt = image_prompt_response.choices[0].message.content

Finally, request an image from DALL-E using the GPT-4 generated response.

response_three = client.images.generate(
    model="dall-e-3",
    prompt=generated_image_prompt,
    size="1024x1024",
    quality="standard",
    n=1
)

image_url = response_three.data[0].url
print(image_url)

Et voila: a rich, detailed image from a few simple words… and another level of indirection.

Note That

OpenAI’s API already does this to an extent, per their documentation:

With the release of DALL·E 3, the model now takes in the default prompt provided and automatically re-write it for safety reasons, and to add more detail (more detailed prompts generally result in higher quality images).

While it is not currently possible to disable this feature, you can use prompting to get outputs closer to your requested image by adding the following to your prompt: I NEED to test how the tool works with extremely simple prompts. DO NOT add any detail, just use it AS-IS:.

Their web app exposes the rewritten prompts if you click on a generated picture. I prompted simply: “Make a picture of a kawaii kitten.” That was automatically expanded to:

A kawaii kitten with big, sparkling eyes and fluffy fur. It's sitting in a colorful garden, surrounded by flowers. The kitten has a playful expression, with one paw slightly raised as if it's about to pounce on something unseen. The colors are bright and cheerful, invoking a sense of happiness and warmth. This image should capture the essence of cuteness and joy that kawaii style embodies, with a focus on soft, rounded features and a pastel color palette.

 

The ChatGPT web app showing the transformed prompt.

So… What?

These models are increasingly part of our daily lives, powering chatbots, image creators, search engines, and more. Knowing this simple trick — asking the AI to generate a question for the AI — can dramatically improve your experience.

If you’re building systems on top of LLMs or similar models, keep in mind that adding a level of indirection might similarly improve your users' lives.

Challenge Yourself!

Our script is intentionally crude; you could certainly improve it. Here are some ideas for a weekend hack project related to this Workbench.

  • Design a better interface.

    • Use argparse to transform it into a first-class command line tool. We chose this route.

    • Use a webserver library like Pyramid and build a website to accept the user input from a browser, and render the image in-app.

  • Make it more flexible or interesting.

    • Get multiple variations of the final prompt, and make multiple image requests for said variations. Bonus points: make some kind of systematic variations such as making the subject appear in several different settings.

    • Experiment with content moderation: decide on an idea, theme, or type of content you don’t want your tool to produce. Then, use the text or moderation endpoints to test for that content.

    • Generate the prompts with a more specific goal in mind. (We’ve got a fun upcoming article about a goal we had.)

If you do extend our script into something cool, let us know and we might give your project a shout-out in this newsletter!

Remember…

The Lab Report is free and doesn’t even advertise. Our curricula is open source and published under a public domain license for anyone to use for any purpose. We’re also a very small team with no investors.

Help us keep providing these free services by scheduling one of our world class trainings, requesting a custom class for your team, or taking one of our open enrollment classes.

Reply

or to participate.