Comedian and author Sarah Silverman, along with authors Christopher Golden and Richard Kadrey, filed lawsuits against OpenAI and Meta on Friday, accusing the companies of copyright infringement.
The lawsuits claim that the tech giants’ chatbots — OpenAI’s ChatGPT and Meta’s LLaMA — were trained using Silverman’s and the other authors’ copyrighted works without their permission. The plaintiffs also argue that the works were obtained from unauthorized sources known as “shadow libraries,” where books are “available for bulk download via torrent systems,” the lawsuit states.
The lawsuits consist of various types of copyright violations, negligence, unjust enrichment, and unfair competition. Silverman and the other plaintiffs are seeking relief by way of statutory damages, restitution of profits, and “other remedies” as a result of the companies’ “unlawful conduct.”
In the complaint, exhibits provided demonstrate how ChatGPT summarized the plaintiffs’ books when prompted, and did so in thorough detail, giving “very accurate summaries,” and thereby violating their copyrights. The lawsuit emphasizes that the chatbot fails to “reproduce any of the copyright management information” that the authors included in their works.
Silverman’s memoir, The Bedwetter is the first book shown as evidence in the complaint, followed by Golden’s Ararat and Kadrey’s Sandman Slim (the latter two are works of fiction). All works are shown to be summarized by ChatGPT in detail, which the lawsuit claims “would only be possible” if the AI models were trained using their books. The complaint acknowledges that the summaries, mostly accurate, do have “some details wrong,” but that is “expected.”
Related: Authors Are Suing OpenAI Because ChatGPT Is Too ‘Accurate’ — Here’s What That Means
“Still, the rest of the summaries are accurate, which means that ChatGPT retains knowledge of particular works in the training dataset and is able to output similar textual content,” the lawsuit states.
Sarah Silverman in March 2023. Jason Kempin | Getty Images
The lawsuit against Meta alleges that the authors’ books were included in datasets used to train Meta’s LLaMA models, with ThePile (one of Meta’s sources for its training datasets) mentioned explicitly as sourced from the illicit Bibliotik private tracker which, along with other “shadow libraries,” the lawsuit says is “flagrantly illegal.”
The authors argue in both lawsuits that they never provided consent for their copyrighted books to be used to train the companies’ chatbots.
Joseph Saveri and Matthew Butterick, the lawyers representing the authors, have created a website to address concerns from other writers, authors, and publishers regarding ChatGPT’s ability to generate text similar to copyrighted material.
“Since the release of OpenAI’s ChatGPT system in March 2023, we’ve been hearing from writers, authors, and publishers who are concerned about its uncanny ability to generate text similar to that found in copyrighted textual materials, including thousands of books,” the lawyers write on the blog. “It’s a great pleasure to stand up on behalf of authors and continue the vital conversation about how AI will coexist with human culture and creativity.”
Related: OpenAI Rolls Out New Feature to Help Teachers Crack Down on ChatGPT Cheating — But Admit the Tool Is ‘Imperfect’
OpenAI and Meta did not immediately respond to Entrepreneur’s request for comment.