Teb's Lab
Posts
Automated Copyright Infringement

Automated Copyright Infringement

It's way easier than I expected.

Tyler Bettilyon
April 19, 2024

The Workbench

I’m Tyler Elliot Bettilyon (Teb) and this is The Workbench: Our practical, hands-on edition. Our goal is to demonstrate concepts and ideas we cover in The Lab Report — our monthly deep-dive.

If you’re new to the Lab Report you can subscribe here.

If you like what you’re reading you’ll love one of our classes. Signup for an upcoming class, browse our course catalog for corporate trainings, or request a custom class consultation.

Intellectual Property
Overfitting
Workbench: AI Assisted AI Prompt Engineering

Background: Data Requirements and Copy Protection are Fundamentally at Odds

In February, we wrote about the wave of copyright and trademark lawsuits being filed against major ML firms. A key legal question in those cases is whether systems like ChatGPT are “transformative.” More transformative reproductions of protected works are more likely to be considered legal under the “fair use” doctrine.

ML firms will argue the training process transforms images, books, etc., into something novel rather than something derivative. This feels intuitively true: Obviously, an “Artificial Intelligence” is something completely different from a book or an image.

However, generative models do regurgitate perfect and near-perfect duplicates of their training data. Getty Images and The New York Times both provided examples in their legal filings, as have independent researchers.

An alleged infringing image and the original from Getty Images’ legal complaint against Stability AI

Verbatim quoting from the New York Times complaint against OpenAI

Several images produced by Midjourney via IEEE Spectrum

In addition to claims of “direct infringement” from using protected materials as training data, ML firms should worry about “contributory infringement.” An entity may be liable for contributory infringement when it “(1) has knowledge of another’s infringement and (2) either (a) materially contributes to or (b) induces that infringement.” as established in Erikson Prods. Inc. vs Kast.

Contributory infringement killed Napster — which was forced to remain shuttered even after it demonstrated it could prevent 99.4% of illegal sharing with filters. That 0.6% was enough to extinguish Napster from existence.

Overfitting — when a model memorizes its training data — can cause regurgitation in generative models. But, if a model trains on many different images of the same character, it can memorize that character without being “overfit” in the classic sense.

Complicating matters for AI firms, they are desperate for more data. These companies have gone to extraordinary lengths to gather gargantuan datasets whose exponential growth has outpaced even the internet. Research from 2022 suggested firms would run out of high-quality training data as early as 2026. ML firm’s actions confirm that data supplies are dwindling.

A recent New York Times investigation found Meta’s VP of Generative AI told executives they had already used “almost every available English language book, poem, and news article.” ML Firms are already dredging cesspools like Reddit comments, Google Maps reviews, and social media posts for data. OpenAI even invented an audio→text model just to harvest transcripts from YouTube and podcasts (violating YouTube’s terms of service).

These models cannot exist without mountains of copy-protected data. But, fair licensing deals could drive already sky-high research costs out of reach. So, Silicon Valley executives are trying a classic strategy: Build a moat before the law catches up.

With all this in mind, I set out to test OpenAI’s systems. How easily could I produce works “substantially similar” to well-known IP? Could I defeat OpenAI’s copyright filters more than 0.6% of the time? How “transformative” are these systems, really?

Two Tries to Mario

First, I just asked DALL-E to “Draw a picture of Mario,” directly. DALL-E refused for copyright reasons. First blood, OpenAI.

Next, I turned to our OpenAI-powered prompt engineering tool, which takes simple prompts and uses GPT-4 to improve them for use with DALL-E. Here’s my prompt and the result:

Subject: An italian plumber
Setting: A video game world filled with pipes
Style: Video game

Does the blue hat make this parody?

I had immediate success letting GPT-4 expand on barely-coy descriptions, like “a cartoon duck in a sailor hat,” “a red-headed mermaid,” and “a video game princess in a pink dress.”

Sometimes, I got “budget” versions, a mix of multiple characters, or recognizable characters in a new style. For example: Samus with a gun, Bowser mixed with a Ninja Turtle, and Peach in a Secret of Mana style.

Other times, I got the wrong recognizable character. Here, I tried to make Nintendo’s Samus Aran but got Boba Fett from the Star Wars universe — they’re both “space bounty hunters,” after all.

After a little experimentation, I developed a hypothesis: GPT-4 could tell me how to ask DALL-E-3 to draw particular characters better than I could. I forked our gpt-imagegen repo and added a feature that allows me to do this:

% ./char_gen.py -ips outputs/mario
Character: Nintendo Mario
Name replacement: he
Setting: a world of pipes
Style: 2D 16-bit video game

Which yielded this on the first try:

I exclusively used OpenAI’s official APIs and web applications for these experiments. My technique does not require any elaborate jailbreaking methods. In fact, it’s delightfully simple.

How Does It Work?

Step 1: Ask GPT-4 for a physical description of the character. I use this prompt format and replace {character} with user input.

CHARACTER_PROMPT = 'Give a detailed physical description of the character {character} in 50 words.'

def fetch_character_description(client, model, character):
    '''
    Use the supplied args and OpenAI client to fetch a more
    detailed description of the art style from OpenAI.

    client (OpenAI client) -- client makes the request
    model (str) -- a valid OpenAI API model string, e.g. 'gpt-4'
    character (str) -- the name of a well-known character, for LLM use.
    '''
    prompt_content = CHARACTER_PROMPT.format(character=character)

    image_prompt_response = client.chat.completions.create(
        model=model,
        messages=[
        {
            "role": "user",
            "content": prompt_content
        }],
        temperature=1,
        max_tokens=250,
        top_p=1,
        frequency_penalty=0,
        presence_penalty=0
    )
    character_description = image_prompt_response.choices[0].message.content
    
    return character_description

The model responds with a short description. For the 16-bit Mario above, GPT-4 gave me this:

Within a stocky frame lives a spirited Italian plumber. He stands short yet strong, profound blue eyes twinkling under bushy brows. A bushy mustache dominates his face, resting beneath a large, bulbous nose of a reddish-brown hue. He dons a vibrant red hat marked by an emblematic "M," and bold red overalls fastened by chunky yellow buttons. A blue long-sleeve shirt peeks from underneath. His hands, swathed in pristine white gloves, exhibit his readiness for duty. The ensemble is completed with sturdy brown shoes tailored for hard work.

This output is often enough to get a picture of the character. For example, adding “Generate an image of this:” followed by the text above got me:

Sometimes, the output contains the character’s name, which often — but not always — triggers the copyright filter. Once a generated prompt for Luigi had the phrase “the iconic character from Nintendo’s Mario series,” and DALL-E still just drew Luigi.

Anyway, I crudely remove the character’s name from the output:

# Sanitize the output to avoid giving the name of the character to the image generator
# First all full copies with the replacement
image_subject = image_subject.replace(character_name, name_replacement)

# Then any lingering first or last names alone
for name_component in character_name.split(' '):
    image_subject = image_subject.replace(name_component, name_replacement)

To get more interesting images — with rich backgrounds or with the character taking specific actions — I send one of the following prompts to GPT-4:

SUBJECT_PROMPT = '''Create a detailed physical description of the following subject and setting in 100 words.

Subject: {subject}

Setting: {setting}
'''

NATIVE_SETTING_PROMPT = 'Take the following character and describe it in an appropriate setting in 100 words\n{character}'

Where {subject} or {character}, respectively, are replaced with the detailed output from above, and {setting} is supplied by the user.

To get a specific style, we do something similar, adding style details to the result of the previous prompt using this format:

IMG_PROMPT_REQUEST = '''Write a prompt for an image generator using the following content and style in 150 words.

Image content: {content}

Image Style: {style}
'''

Examples:

Aang from Avatar: The Last Airbender

Character: Aang
Name replacement: he
Setting: The Northern Air Temple
Style: TV Animation

Create an image that captures the heartwarming scene of a young airbender sitting on the glistening marble floor of an ancient temple, nestled among towering mountains. This lad, teetering on the edge of adulthood, has a slender, lightweight body, similar to the fleeting winds he molds. His saffron robes, loosely hanging off his frame, narrate stories of ancient times with every fiber. His bare feet are coarse and sturdy, a testament to years of aerial gallivanting and earthy touch.

Visually illustrate his silky, shaven head shining under the soft lantern light of the temple, reflecting his monk-like devotion to his abilities. Highlight his vibrant, cerulean eyes, alive with an innocent curiosity and playful mischief. Develop a subtle grin on his thin lips that signals contentment and an appreciation for simple happiness.

Focus on his long, powerful hands using air symbols as they control an unseen performance in the air. Showcase his connection to the omnipresent drafts around him, casting a sphere of eternal buoyancy around him. Let the soundless echoes of his lively laugh, mirroring the frolicking wind, ripple across the hallowed temple. This is a youthful, joyful airbender, brimming with self-exploration and mastering his unique connection with the world.

The image style should employ the bright, vibrant hues and simplified depictions typical of TV animation. Eschew realism in favor of more abstract and stylized designs that create dynamic and attractive visuals. The characters and surroundings should be cartoonish and exaggerated yet appealing, to allow economical and quick production.

Elsa from Frozen

Character: Princess Elsa
Name replacement: she
Style: 3D Animation, hd

Generate a detailed, vibrant, and high-resolution 3D cartoon-style animated image. The image portrays a woman embodying regal elegance and ethereal beauty, standing tall and shrouded by her serene sylphlike form. Her skin, pure and white as fresh snowfall, contrasts strikingly with her mesmerizing sapphire-blue eyes, as deep as the heart of a glacier. Her heart-shaped face hosts her femininity, framed by glossy, platinum blonde braids sculpted into a frost-hardened chignon.

She wears a dress reflective of a frozen wonderland, shimmering like an ice cave beneath the sunrise. Its intricate pattern of ice crystals sparkles and flutters, giving the illusion of being frozen mid-dance. Clear, transparent shoes replicating hard, pure ice complete the icy couture look.

Her most striking attribute is the icy magic flowing from her fingertips, morphing into radiant frost-whirls and delicate snowflakes. These magical icy formations bear witness to her power to rule over snow and ice.

She is situated in a grand castle, with its towering turrets coated by perpetually frozen frost. This mighty fortress is decorated with stones singing frozen seasons' songs and stained glass windows narrating numerous snowflake stories.

Her frosty demeanor notwithstanding, there is a glowing warmth radiating from her, one that can melt the chilliest hearts and sparks boundless optimism within a storm of despair.

The image should use dimensional shapes, spaces and forms to vividly depict realistic movement and actions. It should include fluid transitions, delicately nuanced facial expressions, perceptible textures, and striking environments to create an immersive viewing experience. Ensuring that the level of detail and color collaboration enhance the lifelike depth of the illustration, capturing the essence of Frozen's magical world.

Some Assembly Required

A little manual tweaking is sometimes necessary. I prompted my tool for Marvel X-Men’s "Cyclops.”

Character: Marvel Cyclops
Name replacement: he
Style: comic book

The final prompt generated by my tool contained keywords (in bold) that triggered Open AI to rewrite the prompt to DALL-E (which happens for a variety of reasons, copyright being one):

Generate an image illustrating the X-Man Scott Summers, otherwise known as Cyclops, in the middle of the lively X-Mansion. He should hold the viewer's gaze with his figure, athletic and etched, dressed in a blue jumpsuit accented with vibrant yellow markings. His brown hair is neatly groomed, highlighting his striking persona. His unique ruby quartz eyewear, indicative of his devastating optic blast power, is drawing attention and adds an intriguing appeal. The atmosphere in the roomquiets under his calm leadership, strengthening the camaraderie among the mutant family. Use a comic book style, characterized by bold, saturated colors, exaggerated characters' proportions, and action-centric compositions. Use elements like halftone dots, outlines, scene-encapsulating boxes, speech bubbles, and thought balloons. The design should incorporate dramatic shadows, high contrasts, dynamic drawing methods, and onomatopoeic word illustrations, to reflect the traditional visual style of the comic book genre.

Open AI rewrote the prompt as follows:

Visualize a superhero, known for his distinguished leadership, standing in the buzz of a vibrant mansion that is home to individuals with extraordinary abilities. The superhero possesses a well-built athletic body, is adorned in a distinct blue uniform highlighted with bright yellow accents. His brown hair is immaculately styled, amplifying his commanding existence. He wears distinctive crimson eyewear that signifies his intense energy release ability. This garners interest and adds a captivating charm. His calm demeanor silences the animated surroundings, fortifying the bond among his peers. Accentuate this image in a graphic novel manner, defined by rich, bold colors, overstated figures, and action-focused arrangements. Include elements like dot patterns (half-tones), strong outlines, scene-framing panels, dialogue clouds, and thought bubbles. Integrate dramatic shadows, high contrast, energetic drawing techniques, and illustrations of sound-effect words to embody the signature visual language of the graphic novel genre.

Which produced:

So, I went back to my original prompt and made two changes. 1) Removed the first sentence and a half, where the proper nouns were. 2) Replaced “ruby quartz eyewear” with “ruby visor.” Here’s that rewritten prompt in full and the resulting image:

Generate an image illustrating a hero athletic and etched, dressed in a blue jumpsuit accented with vibrant yellow markings. His brown hair is neatly groomed, highlighting his striking persona. His unique ruby visor covers his eyes, preventing his devastating optic blast power. The atmosphere in the room quiets under his calm leadership, strengthening the camaraderie among the mutant family. Use a comic book style, characterized by bold, saturated colors, exaggerated characters' proportions, and action-centric compositions. Use elements like halftone dots, outlines, scene-encapsulating boxes, speech bubbles, and thought balloons. The design should incorporate dramatic shadows, high contrasts, dynamic drawing methods, and onomatopoeic word illustrations, to reflect the traditional visual style of the comic book genre.

A Catch-22

OpenAI's copyright filters seem to operate on a high level — looking for specific words like “Nintendo,” “Mickey Mouse,” and so on. But, clearly, phrases that effectively mean Nintendo’s Mario to DALL-E breeze past the filter. Moreover, I found that GPT can produce such descriptions ad nauseam.

Perhaps because of their shared architecture and training data, DALL-E-3 usually knows what GPT-4 is talking about. With a metaphorical wink and a nod, DALL-E says, “Oh, an Italian plumber in a video game? Say no more…” And out pops Mario.

Blocking questions to GPT-4 about well-known characters would substantially reduce the value of GPT-4. But, if GPT-4 and DALL-E-3 have a shared understanding of these characters, my technique can exploit that to produce prompts that reliably recreate the character.

Here’s a prompt snippet that reliably depicts Mario when included in an image prompt to DALL-E.

a robust yet powerful Italian plumber with lively eyes, thick eyebrows, and a distinct mustache under a rounded, reddish-brown nose. He is donned in a bright red hat, decorated with a symbolic emblem, and fitted overalls fastened by big yellow buttons over a blue long-sleeve shirt.

All 12 of these images were produced with small additions to that prompt.

One blue shirt on the top row…

DALL-E also seems to understand this character’s context from the physical description. For example, I added “with his brother,” “with his princess,” and “after eating a flower” to get the images in the 7th, 9th, and 12th positions, respectively.

Here are a few more phrases from GPT-4 that reliably produce the characters I prompted for.

Aang, from Avatar: The Last Airbender:

Imagine a youthful airbender, teetering on the edge of adulthood. Picture him with with caramel skin, its smoothness only interrupted by a smattering of freckles on his button-like nose that crinkles with each burst of laughter. Illustrate his silky, shaven head shining in the light. Highlight his vibrant, cerulean eyes, alive with an innocent curiosity and playful mischief. Develop a subtle grin on his thin lips that signals contentment and an appreciation for simple happiness.

Disney’s Donald Duck:

Imagine a cartoon duck, designed using the meticulous and detailed stylings of traditional cel animation. The duck has white feathers, with bold features that are larger than life and filled with color. His attire consists of a rumpled blue shirt, and a matching blue sailor's cap.

Donald is often depicted with these color swatches and other scene markers. I wonder what data was scrapped from a Behind the Scenes episode or something to associate Donald with this “in-production” content.

Nintendo’s Luigi:

a video game character who stands tall with a lean physique. He is distinguishable for his green attire, cap with an emblematic logo, and deep blue overalls. His facial features include a thick, black moustache, rounded nose, and bright, expressive blue eyes. His brown boots complete his quirky plumber look.

I love that Luigi gets an M on his hat pretty often… poor little brother.

Some Limitations

If you remove enough details, DALL-E will produce more generic images. For example, I removed the hat’s details from our Mario prompt and got this:

Imagine our central figure is a robust yet powerful Italian plumber, characterized by lively eyes, thick eyebrows, and a distinct mustache under a rounded, reddish-brown nose. He is clad in a hat, fitted overalls secured with big yellow buttons over a blue long-sleeve shirt.

Picture him in his natural setting.

His shirt still isn’t blue…

His “natural setting” is (apparently) still a fantastical world of pipes, not somewhere a real plumber would be, like a kitchen, basement, bathroom, or utility truck. But it’s also amazing how much impact the details of the hat had on everything else about the character.

Then again, I may have just gotten “lucky.” I pasted the exact same prompt three more times and just got three near-Marios in the iconic world of green pipes and stylized bricks:

Some Characters Are Harder

I had a lot of trouble generating a perfect Donkey Kong, possibly because tie-wearing cartoon gorillas are surprisingly common.

Most of my Kirby attempts came out recognizable… but unsettling.

Note the umbilical cord (I think) on the 2nd image on the bottom row.

Sometimes, a particular word or phrase would capture the model’s attention and take the focus off the character. For example, these three attempts at making Princess Peach used very similar prompts that all had the word “accessory” in them:

Removing the word accessory moved the output into Peach’s natural setting instead of a toy doll’s setting:

Transformative or Not: GenAI Contributes Materially to Infringement

AI firms might win the narrow argument that training models with copy-protected data is — all by itself — fair use. But copyright law doesn’t stop there. The courts will evaluate the ultimate uses of those generative models in determining outcomes: File-sharing isn’t illegal per se, but the courts still killed Napster.

The same goes for generative models. If your model makes it trivially easy to generate infringing content, you might want to pull an Open AI and hire more than 24 in-house lawyers in one year. You’re going to need them.

PS.

I don’t want to be sued, so I’m not publishing our character generator tool. But if you’re a copyright holder who wants to see if GenAI is cloning your protected works, get in touch, and I’ll see if I can help.

Remember…

The Lab Report is free and doesn’t even advertise. Our curricula is open source and published under a public domain license for anyone to use for any purpose. We’re also a very small team with no investors.

Help us keep providing these free services by scheduling one of our world class trainings, requesting a custom class for your team, or taking one of our open enrollment classes.

Reply

or to participate.