Publishers Sue Meta and Mark Zuckerberg Over AI Training Data

A class-action lawsuit filed Tuesday alleges Meta knowingly used pirated books and journal articles to train its Llama AI models—bypassing legal licensing markets with the personal authorization of CEO Mark Zuckerberg.

The plaintiffs include major publishers Hachette, Macmillan, McGraw Hill, Elsevier, and Cengage, alongside author Scott Turow and his company S.C.R.I.B.E.

The Allegations

The complaint claims that Meta copied copyrighted works from pirate websites such as LibGen and Anna's Archive to train its Llama language models. The publishers allege that:

Meta knowingly bypassed legal licensing markets to gain an unfair advantage in AI development, with Zuckerberg's personal authorization.

According to the lawsuit, Meta considered negotiating licensing deals with publishers in early 2023. However, in April 2023, the issue was escalated to Zuckerberg, who allegedly gave verbal instructions to stop all licensing efforts.

Specific Works Cited

The lawsuit identifies several copyrighted works that were allegedly used without permission, including:

Scott Turow's "Presumed Innocent"
Douglas Preston's "Impact"
Peter Brown's "The Wild Robot"
N.K. Jemisin's "The Fifth Season"
Lemony Snicket's "Who Could That Be at This Hour?"
Various academic journal articles

Proposed Class and Remedies

The proposed class is expansive, including all owners of registered copyrights for any book with an ISBN or journal article with a DOI/ISSN.

The plaintiffs seek:

Statutory damages
A permanent injunction ordering Meta to stop using their works
An order to destroy all infringing copies

Reactions

Scott Turow stated that AI has been "created with stolen words" and criticized Meta for violating the law despite being a wealthy corporation.

Authors Guild CEO Mary Rasenberger described the alleged infringement as "the most flagrant copyright breach in history."

Meta, however, disputes the claims. Nkechi Nneji, Meta's public affairs director, said the company believes training AI on copyrighted material can qualify as fair use and stated that Meta will "fight this lawsuit aggressively."

Broader Context

This lawsuit is part of an ongoing wave of copyright infringement cases against AI companies:

In September 2025, Anthropic settled a similar lawsuit for $1.5 billion.
A federal judge dismissed a copyright lawsuit against Meta in June 2025 due to insufficient evidence of harm.
In a separate case involving Anthropic, a judge initially supported fair use but later ruled that using millions of pirated books without consent was not permissible, leading to the settlement.

This case highlights the growing legal battle between creators and AI developers over the use of copyrighted material in training data.