US news organizations just won a major discovery fight with OpenAI—and they’re not stopping there.
A federal judge has ordered OpenAI to hand over 20 million ChatGPT conversation logs in the New York Times–led copyright lawsuit. Now those same publishers want the court to go a step further: investigate “mass deletions” and force OpenAI to dig up potentially millions of chats users thought were gone for good.
Judge says 20 million logs must be handed over
On Monday, US District Judge Sidney Stein rejected OpenAI’s attempt to narrow what it has to share.
OpenAI had argued that Magistrate Judge Ona Wang didn’t sufficiently account for the privacy of ChatGPT users who aren’t part of the case when she ordered the company to produce 20 million logs. The company proposed a different plan: it would run search terms itself, identify possibly infringing outputs, and only give news publishers the hits.
Stein wasn’t convinced.
He ruled that Wang had already weighed privacy properly:
- The sample was cut down from “tens of billions” of logs to 20 million.
- OpenAI has stripped personally identifying information from the chats.
Stein also agreed with Wang that news publishers need access to the full 20 million-log sample, not just obvious infringement. As Wang previously wrote, even “output logs that do not contain reproductions of News Plaintiffs’ works may still be relevant to OpenAI’s fair use defense.”
OpenAI further complained that Wang hadn’t explicitly explained why she rejected the search-term proposal. Stein brushed that aside, writing that her explanation for ordering production of the entire de-identified sample was enough and “not clearly erroneous or contrary to law.”
OpenAI told Ars Technica it is still reviewing whether there are any avenues left to fight the order, but this looks close to the end of the road after the company publicly vowed to do everything it could to avoid exposing ordinary users’ conversations.
In a blog post last updated in mid-December, OpenAI stressed that the data to be shared has “undergone a de-identification process intended to remove or mask PII and other private information.” According to OpenAI, news plaintiffs will be able to search the logs but will not be allowed to copy or print any data not directly relevant to the case.
The sanctions fight: “mass deletion” of chats?
Behind the scenes, the discovery fight is getting uglier.
News organizations—led by The New York Times and joined by other publishers—say the logs will show more than simple copyright violations. They expect to find:
- Chatbot responses that allegedly reproduce their articles without permission.
- Outputs that dilute or “water down” news trademarks.
- Responses that strip copyright management information (CMI), allegedly obscuring the source and making it easier to reuse content without a license.
They also accuse OpenAI and co-defendant Microsoft of dragging their feet.
Microsoft has agreed to turn over 8.1 million Copilot logs but hasn’t said when, prompting publishers to ask the court to order Microsoft to produce those logs “immediately” in a searchable, remotely accessible format—by January 9 or within a day of the court’s ruling on their motion. Microsoft declined Ars’ request for comment.
The more explosive allegation targets OpenAI’s data retention.
According to court filings, it took 11 months for news organizations to learn that “OpenAI was destroying relevant output log data” by failing to suspend routine deletion once litigation began. The allegedly destroyed data includes a “quite substantial” fraction of ChatGPT Free, Pro, and Plus output logs.
The filings claim:
- OpenAI deleted roughly one-third of all user conversation data in the month after The New York Times filed suit.
- The company’s only explanation was that the “number of ChatGPT conversations was uncharacteristically low (shortly before New Year’s Day 2024).” News orgs called that an “irrelevant non-sequitur.”
- There were “two spikes in mass deletion” that OpenAI attributed to “technical issues.”
Publishers say this fits a “playbook” to blunt copyright claims: OpenAI allegedly failed to “take any steps to suspend its routine destruction practices” once it knew it was being sued.
At the same time, OpenAI allegedly preserved logs that might help its defense. Citing testimony from Mike Trinh, OpenAI’s associate general counsel, the filing says OpenAI kept data from accounts mentioned in publishers’ complaints but did not take similar care to preserve other chats that might show third parties eliciting news content.
“In other words,” the filing concludes, “OpenAI preserved evidence of the News Plaintiffs eliciting their own works from OpenAI’s products but deleted evidence of third-party users doing so.”
How much data was actually deleted remains unclear. News organizations say OpenAI has refused to share “the most basic information” about its deletion practices. By contrast, they argue, Microsoft “apparently had no trouble” preserving Copilot logs.
Could “deleted” chats come back?
News publishers now want the court to consider sanctions against OpenAI and keep tight controls on what happens to user data from here.
They’re asking the judge to:
- Leave in place a preservation order blocking OpenAI from permanently deleting users’ temporary and deleted chats.
- Force OpenAI to explain “the full scope of destroyed output log data for all of its products at issue.”
- Determine whether those deleted chats—including the alleged “mass deletions”—can be restored, so they can be examined as evidence.
That last demand is the one that may unsettle ChatGPT users.
If the court agrees, OpenAI could be pushed to attempt recovery of logs that users assumed were gone. Depending on how OpenAI implements “deletion” under the hood—logical flags, delayed purges, backups—there could be a lot of data suddenly back in play.
The legal fight is, on paper, about copyright and fair use. But it’s quickly turning into a stress test for how AI companies handle user data when the stakes are high. The outcome will shape not just how much copyrighted material models were trained on, but how “delete” really works when your conversations are sitting on an AI company’s servers—and a federal judge wants to see them.



