Three authors have filed a lawsuit against AI startup Anthropic, alleging the firm used their copyrighted works without permission to train its Claude language models.
Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson filed a complaint in a California court, accusing Anthropic of having “pirated” their written material to develop its AI systems. The authors claim Anthropic downloaded pirated versions of their books from illegal websites to use as training data.
The lawsuit alleges Anthropic “built a multibillion-dollar business by stealing hundreds of thousands of copyrighted books.” It states the company “ignored copyright protections” and engaged in “large scale theft of copyrighted works” to train its Claude models.
Anthropic has not commented substantively on the allegations, only saying it is “aware” of the legal action. The case joins similar lawsuits against other AI companies like Microsoft and OpenAI over using copyrighted material to develop large language models. It highlights growing tensions between content creators and AI firms regarding intellectual property rights.
According to the complaint, Anthropic used a dataset called ‘The Pile’ to train Claude. This dataset allegedly included a collection of pirated ebooks called ‘Books3,’ which contained nearly 200,000 books downloaded from an unauthorised source.
The authors argue that Anthropic knew it was using copyrighted works without permission. They claim the company made a “deliberate decision to cut corners and rely on stolen materials to train their models” rather than obtaining proper licences.
The lawsuit states that Anthropic’s actions have harmed authors by depriving them of book sales and licensing revenues. It alleges the company’s AI models now compete with human-written content, threatening writers’ livelihoods.
For context, Anthropic positions its Claude models as rivals to OpenAI’s ChatGPT and other prominent AI chatbots. The company has raised billions in funding and is valued at over $18 billion.
Critics argue that AI firms should compensate authors and publishers for using their works as training data. Some companies like Google have begun licensing deals with news organisations and other content providers.
However, AI developers contend that using copyrighted material for machine learning falls under copyright law’s ‘fair use’ provisions. They argue that their models do not reproduce exact copies of training texts.
The debate touches on complex legal and ethical questions about how copyright applies to AI development. Courts may need to determine whether AI training constitutes copyright infringement or transformative fair use.
For authors, the lawsuit represents an effort to assert control over how their works are used in AI development. They argue that companies profiting from AI should compensate creators whose material made the technology possible.
The case could have significant implications for the AI industry if courts rule that firms must obtain licences for all copyrighted material used in training. This would likely increase costs and complexity for AI development.
Anthropic has focused on developing “safe and ethical” AI systems. The company’s CEO has described it as “focused on public benefit.” However, the authors’ lawsuit challenges this image, accusing Anthropic of building its business through copyright infringement.
The complaint seeks statutory damages for alleged wilful copyright infringement and an injunction to prevent Anthropic from further using the authors’ works without permission.
As AI capabilities grow, debates over intellectual property are likely to intensify. Content creators argue that their work should be protected and compensated, while AI companies push for access to broad datasets to improve their models.
The outcome of cases like this one against Anthropic could help shape the legal and regulatory landscape for AI development. It may influence how companies approach training data collection and whether widespread licensing becomes the norm.
For now, the lawsuit adds to the mounting legal challenges facing major AI firms over their use of copyrighted material. As courts grapple with these issues, their rulings could have far-reaching effects on the future of AI and content creation.
The case is filed as Andrea Bartz et al. v. Anthropic PBC, US District Court for the Northern District of California, No. 3:24-cv-05417.
(Photo by Anthropic)
See also: Anthropic says Claude 3 Haiku is the fastest model in its class
Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.
Explore other upcoming enterprise technology events and webinars powered by TechForge here.