New Law Would Force AI Companies to Reveal Copyrighted Training Data

If companies don’t disclose, they get fined by the government.

Training Day

The war over the use of copyrighted material in AI has opened up a new front, Billboard reports, after Representative Adam Schiff (D-CA.) introduced a bill this week that would force companies to reveal if they used copyrighted content in their training data sets for machine learning.

The proposed Generative AI Copyright Disclosure Act, if passed, would require businesses to disclose a list of all the copyrighted content they used in a training data set for any new models at least 30 days before they are made public. For current models already being used by consumers, businesses would still be required to disclose copyrighted content if they alter training data “in a significant manner.”

Companies would have to file the disclosure with the Register of Copyrights or face a potential monetary penalty for non-compliance.

“We must balance the immense potential of AI with the crucial need for ethical guidelines and protections,” said Schiff in a statement. “My Generative AI Copyright Disclosure Act is a pivotal step in this direction.”

Fair Cut

In the same statement, the proposed legislation was applauded by an array of representatives from various creative sectors such as the Recording Industry Association of America and the Hollywood professionals in SAG-AFTRA.

The fight over the use of copyrighted material has already sparked lawsuits against companies like OpenAI and its partner Microsoft, such as by The New York Times, which filed a suit against the two in December.

AI companies and their proponents have claimed that using copyright material in training data sets to generate new content like AI images and songs is an example of fair use, meaning they shouldn’t have to pay artists, writers, and other owners of the copyrighted material they use.

But that doesn’t sit well with many people in the creative industry.

While the bill doesn’t outright ban the use of copyrighted material in AI training data sets, a forced public disclosure may spur more lawsuits from copyright holders — as well as anger and anxiety from a public who are skeptical of AI’s expanding use.

If we go by from the loud booing at a pro-AI sizzle reel at a recent SXSW event, things are truly heating up in the court of public opinion and that may chart the future course of the AI industry.

More on AI: When AI Is Trained on AI-Generated Data, Strange Things Start to Happen

Eric Schmidt Warned Against China’s AI Industry. Emails Show He Also Sought Connections to It

Eric Schmidt Warned Against China’s AI Industry. Emails Show He Also Sought Connections to It

deepcure funding round. credit - StartupHubAI

IAG Capital Partners Leads $24M in Generative AI Startup for Drug Discovery, DeepCure