AI Image Generators Are Spitting Out Copyrighted Characters, Raising Possibility of Catastrophic Lawsuit

Companies like OpenAI and Midjourney have opened Pandora’s box, opening them up to considerable legal trouble by training their chatbots on the vastness of the internet while largely turning a blind eye to copyright.

As professor and author Gary Marcus and film industry concept artist Reid Southen, who has worked on several major films for the likes of Marvel and Warner Brothers, argue in a recent piece for IEEE Spectrum, tools like DALL-E 3 and Midjourney could land both companies in a “copyright minefield.”

It’s a heated debate that’s reaching fever pitch. The news comes after the New York Times sued Microsoft and OpenAI, alleging it was responsible for “billions of dollars” in damages by training ChatGPT and other large language models on its content without express permission. Well-known authors including “Game of Thrones” author George RR Martin and John Grisham recently made similar arguments in a separate copyright infringement case.

And at least one prominent group of artists has filed a class action suit against Midjourney and its competitor Stability AI, alleging that their image-generating AIs have been misusing their copyrighted work.

Lending weight to those complaints, it’s true that AI image generators like Midjourney and OpenAI’s DALL-E 3 can also easily be used to produce potentially copyright-infringing images, as Marcus and Southen show in a series of experiments.

“After a bit of experimentation (and in a discovery that led us to collaborate), Southen found that it was in fact easy to generate many plagiaristic outputs, with brief prompts related to commercial films,” the piece reads.

The evidence is pretty damning: an original image showing a series of well-known Marvel superheroes can easily be reproduced, albeit slightly modified, using detailed prompts devised by Southen.

“In light of these results, it seems all but certain that Midjourney V6 has been trained on copyrighted materials (whether or not they have been licensed, we do not know) and that their tools could be used to create outputs that infringe,” they write.

Strikingly, the pair didn’t even need to directly invoke the name of a popular movie to come up with uncanny images of Nintendo’s Mario or a believable screencap of the Disney-owned Star Wars franchise’s Darth Vader.

Even just entering the word “screencap” came up with images that “closely resemble film frames” from the Star Wars, Marvel and Frozen franchises.

At the end of the day, however, it’s still up to lawmakers to decide whether these reproductions amount to copyright infringement. We also don’t know what content these AI models were trained on.

But given the evidence, there’s certainly plenty of copyrighted material being scraped to create the models, likely without express permission.

“If any of the source material is not licensed, it seems to us (as nonlawyers) that this potentially opens Midjourney to extensive litigation by film studios, video-game publishers, actors, and so on,” Marcus and Southen write.

Companies like Midjourney are also directly profiting off users using their models to generate these images, charging a monthly subscription fee.

In short, until we have a clear legal precedent, it’s unclear if the courts will side with AI companies or the producers and artists they’re ripping off — but the danger of a catastrophic ruling against the AI companies seems plausible.

Addressing these issues may also prove difficult. An effective solution will likely have to go far beyond creating simple user-facing filters. Guardrails implemented by AI companies have also already proven trivially easy to circumvent.

It’s a thorny problem made even thornier given the “black box” nature of these AI models. While Google Images lists a source, AI image generators are unlikely to offer the same kind of reassurance.

Will the likes of OpenAI and Midjourney experience the same fate as Napster, a peer-to-peer file sharing service that imploded in spectacular fashion in the early 2000s after losing several lawsuits, as some experts have pointed out?

Given the sheer potential for copyright infringement, Marcus and Southen argue that it’ll likely get even messier as time goes on.

“We believe that the potential for litigation may be vast, and that the foundations of the entire enterprise may be built on ethically shaky ground,” they conclude.

More on copyright: OpenAI Pleads That It Can’t Make Money Without Using Copyrighted Materials for Free