AI companies would be required to disclose copyrighted training data under new bill

Two lawmakers filed a bill requiring creators of foundation models to disclose sources of training data so copyright holders know their information was taken. The AI Foundation Model Transparency Act — filed by Reps. Anna Eshoo (D-CA) and Don Beyer (D-VA) — would direct the Federal Trade Commission (FTC) to work with the National Institute of Standards and Technology (NIST) to establish rules for reporting training data transparency.

Companies that make foundation models will be required to report sources of training data and how the data is retained during the inference process, describe the limitations or risks of the model, how the model aligns with NIST’s planned AI Risk Management Framework and any other federal standards might be established, and provide information on the computational power used to train and run the model. The bill also says AI developers must report efforts to “red team” the model to prevent it from providing “inaccurate or harmful information” around medical or health-related questions, biological synthesis, cybersecurity, elections, policing, financial loan decisions, education, employment decisions, public services, and vulnerable populations such as children.

“With the increase in public access to artificial intelligence, there has been an increase in lawsuits and public concerns about copyright infringement,” the bill states. “Public use of foundation models has led to countless instances of the public being presented with inaccurate, imprecise, or biased information.”

The bill still needs to be assigned to a committee and discussed, and it’s unclear if that will happen before the busy election campaign season starts.

Eshoo and Beyer’s bill complements the Biden administration’s AI executive order, which helps establish reporting standards for AI models. The executive order, however, is not law, so if the AI Foundation Model Transparency Act passes, it will make transparency requirements for training data a federal rule.