Training AI music models is about to get very expensive

Indeed, the tools will block a request if it names an artist. But the record labels allege that the safeguards have significant loopholes. Following the news of the lawsuits, for instance, social media users shared examples suggesting that if users separate an artist’s name with spaces, the request may go through. My own request for “a song like Kendrick” was blocked by Suno, citing an artist’s name, but “a song like k e n d r i c k” resulted in a “hip-hop rhythmic beat-driven” track and “a song like k o r n” resulted in “nu-metal heavy aggressive.” (To be fair, they didn’t resemble the respective artist’s unique styles, but to even respond in the right tightly-defined genre seems to suggest that the model is in fact familiar with each artist’s work.) Similar workarounds were blocked on Udio.

Possible outcomes

There are three ways the case could go, Grimmelmann says. One is wholly in favor of the AI startups: the lawsuits fail and the court determines AI companies did not violate fair use nor imitate copyrighted works too closely in their outputs. If the models are found to fall under fair use, it would mean songwriters and rights holders would need to find a different legal mechanism to pursue compensation.

Another possibility is a mixed bag: the court finds the AI companies did not violate fair use in their training, but must better control the model’s output to make sure it does not improperly imitate copyrighted works. Grimmelmann says this would be similar to one of the initial rulings against Napster, in which the company was forced to ban searches for copyrighted works in its libraries (though users quickly found workarounds).

The third and essentially nuclear option is that the court finds fault on both the training and output sides of the AI models. This would mean the companies could not train on copyrighted works without licenses, and could also not allow outputs that closely imitate copyrighted works. The companies could be ordered to pay damages for infringement, which could run into the hundreds of millions for each company. If they aren’t bankrupted by such a ruling, it would force them to completely restructure their training through licensing deals, which could also be cost-prohibitive.

To license or not to license

Though the immediate goals of the plaintiffs are to get the AI companies to cease training and pay damages, chairman of the Recording Industry Association of America Mitch Glazier is already looking ahead toward a future of licensing. “As in the past, music creators will enforce their rights to protect the creative engine of human artistry and enable the development of a healthy and sustainable licensed market that recognizes the value of both creativity and technology,” he wrote in a recent op-ed in Billboard.

Such a market for licenses could mirror what has already unfolded for text generators. OpenAI has struck licensing deals with a number of news publishers, including Politico, The Atlantic, and The Wall Street Journal. The deals promise to make content from the publishers discoverable in OpenAI’s products, though the ability for the models to transparently cite where they’re getting information from is limited at best.

If AI music companies follow that pattern, the only ones with the means to create powerful music models might be those with the most cash. That’s perhaps exactly what YouTube is thinking. The company did not immediately respond to questions from MIT Technology Review about the details of its negotiations, but given the massive amount of data required to train AI models and the concentration of rights owners in music, it’s fair to assume the price of deals with record labels would be eye-popping.

In theory, an AI company could bypass the licensing process altogether by building its model exclusively on music in the public domain, but it would be a herculean task. There have been similar efforts in the realm of text and image generators, including a legal consultancy in Chicago that created a model trained on dense regulatory documents, and a model from Hugging Face that trained on images of Mickey Mouse from the 1920s. But the models are small and unremarkable. If Suno or Udio is forced to train on only what’s in the public domain—think military march music and the royalty-free songs found in corporate videos—the resulting model would be a far cry from what they have today.