Meta, BitTorrent and the AI Training Wars: Why One Lawsuit Suddenly Got Dangerous

Meta’s generative AI ambitions are colliding with one of the oldest fault lines on the Internet: who pays when mass copying happens in the background. A small procedural win for book authors in a US court—plus a fresh Supreme Court ruling on piracy—has turned a fairly technical fight over BitTorrent into a potential turning point for how AI models are trained. If you care about how your books, articles or code end up in AI systems, this case is suddenly a lot more important than it looked a month ago.

In this piece, we unpack what actually happened, why Meta is betting on a new legal shield, and what it all means for the next phase of AI regulation—especially from a European perspective.

The news in brief

According to reporting by Ars Technica’s Ashley Belanger, Meta is facing two US lawsuits over how it acquired training data for its AI models by using BitTorrent:

Entrepreneur Media v. Meta – a copyright suit arguing that Meta is liable for contributory infringement because it allegedly seeded roughly 80 TB of pirated works via BitTorrent when assembling training data.
Kadrey v. Meta – a class action brought by book authors, which originally focused on a direct copyright claim (unauthorised distribution) tied to Meta’s torrenting.

The authors’ case looked weaker because proving direct distribution via BitTorrent requires showing that full works were actually seeded. As Ars Technica notes, the Entrepreneur Media suit instead leans on contributory infringement, which only requires showing that Meta facilitated infringing transfers.

US District Judge Vince Chhabria has now allowed the authors to add this easier contributory claim to their class action—despite sharply criticising their lawyers for lateness and grandstanding. At the same time, Meta has told the court it plans to rely on a recent US Supreme Court decision involving ISP Cox Communications, which narrowed when companies can be held liable for users’ piracy, to try to kill the contributory claims.

Why this matters

This is not just another copyright skirmish. Two strategic questions are on the table:

Can large AI companies outsource legal risk to the infrastructure they use?
Will courts treat training-data acquisition as something special, or just as another flavour of mass copying?

The authors’ new contributory claim is dangerous for Meta precisely because it attacks behaviour that looks very different from a passive ISP. Meta allegedly actively seeded huge volumes of obviously copyright-protected material using a protocol whose entire point is to maximise sharing. That is worlds apart from merely providing generic Internet access.

If a jury ever hears that story in simple language—"Meta joined a piracy-style network, pumped in tens of terabytes of unlicensed books and articles, and used the swarm to collect more"—it is not hard to see where sympathy will lie.

Meta’s counter is to wrap itself in the cloak of the Supreme Court’s Cox decision, which, as summarised by Ars Technica, draws a sharp line: merely offering a service that some people use to infringe is not enough for contributory liability; you generally need proof of affirmative inducement of specific infringements. That standard is friendly to ISPs and platforms that are one step removed from the copying.

The risk for rightsholders is obvious: if Meta can successfully argue that even this kind of industrial-scale torrenting is just “using a neutral technology” without inducement, the bar for suing any AI company over training data gets dramatically higher.

On the other hand, if the contributory theory survives, plaintiffs gain a powerful template. You would no longer need to reconstruct exactly which book file went where. You would only need to show that a lab knowingly used a mechanism whose ordinary operation predictably redistributed unlicensed works at scale.

In other words: this case is becoming the testbed for how far copyright law will stretch to police the plumbing of AI training.

The bigger picture

Zooming out, Meta’s BitTorrent headache intersects with at least three wider trends in AI and copyright:

The shift from scraping to bulk acquisition. Early generative models were mostly fed via web crawling of public pages—legally murky but technically diffuse. Torrenting 80 TB of curated content is different. It looks less like "reading the open web" and more like directly tapping into shadow libraries and pirate archives.
The rise of training-data lawsuits as leverage. Since 2023, we’ve seen a wave of copyright cases against OpenAI, Microsoft, Stability AI and others over training on books, news and code. Most hinge on whether non-consensual copying for training is itself infringement, and whether model outputs are "substantially similar" to originals. Those are slow, expert-heavy questions. By contrast, torrenting is familiar territory for judges and juries—it carries decades of legal and cultural baggage from the Napster and BitTorrent eras.
Courts re‑drawing the map of secondary liability. The Cox ruling, plus past cases like MGM v. Grokster, are part of a long US trend: judges want to protect general-purpose infrastructure while still punishing actors who invite infringement. Meta is trying to stand on the infrastructure side of that line, even though, factually, it looks more like a very sophisticated end-user.

This matters beyond Meta. Any AI lab that has relied on LAION-style datasets, grey-market ebook collections or academic mirror sites is now on notice. Plaintiffs are learning to follow the data plumbing, not just the outputs.

Interestingly, Judge Chhabria’s order also highlights a class-action governance problem. He more or less accuses the authors’ lawyers of being more interested in scoring rhetorical points against Meta than in carefully building the case. That is a warning shot: courts will entertain bold legal theories about AI, but they expect tight, well-timed pleadings, not Twitter-ready broadsides.

Meta, meanwhile, is playing the long game familiar from platform liability fights: use every procedural and doctrinal tool—safe harbours, narrow inducement standards, arguments about lack of specific knowledge—to delay discovery and force plaintiffs into expensive, uncertain battles.

The European angle

From a European perspective, this fight is doubly interesting because the legal starting point is very different.

The EU’s Text and Data Mining (TDM) exceptions already allow some non‑commercial and, under conditions, commercial text and data mining—but crucially, they let rightsholders opt out. On top of that, the upcoming EU AI Act will require providers of large foundation models to document, in high-level terms, what kinds of data they trained on and to respect EU copyright law when doing so.

If a Meta‑style torrenting scheme touched servers in the EU and involved works from European authors who had opted out of TDM, it would sit on much shakier ground than in the US. Even if the Cox precedent helps Meta at home, it offers no shield against EU rules that treat training as a regulated use in itself.

For European publishers and collecting societies, the US cases are still strategically important. A strong contributory-liability theory in the US would:

strengthen their hand in global licensing talks with AI labs;
increase the pressure for industry-wide training licences, similar to how streaming triggered blanket music licences; and
provide a legal template for EU courts that are starting to see their own AI‑related copyright disputes.

Conversely, if Meta rides the Cox ruling to victory, it will embolden AI companies to argue in Europe that they, too, are just neutral users of generic infrastructure—even if EU law says otherwise. The political battle will then shift even more to Brussels: should the AI Act’s transparency and copyright provisions be tightened in the next revision cycle?

For smaller European markets, including those with less‑resourced languages, there is a further worry: if courts send the message that large-scale, unlicensed ingestion of books is low-risk, the incentive to ever negotiate proper licences for Slovenian, Croatian, Slovak or Baltic content will almost vanish.

Looking ahead

Several things are worth watching over the next 12–18 months:

Meta’s supplemental brief on Cox. How aggressively will the company push the analogy between its conduct and that of an ISP? The more it relies on Cox, the more it invites judges to draw fine distinctions between "providing a network" and "actively seeding." That is a debate AI labs might not actually want to have in public.
Summary judgment on contributory infringement. Judge Chhabria has made clear that Meta faces no discovery in the authors’ class action until plaintiffs get past summary judgment on the distribution and contributory claims. If the contributory claim survives, the leverage instantly flips: Meta would face intrusive discovery about who approved the torrenting and what they understood about BitTorrent’s mechanics.
Privilege fights. Plaintiffs are already signalling that, if they win on liability, they will argue that Meta cannot hide internal torrenting discussions behind privilege. A ruling that strips confidentiality from internal AI‑training deliberations would be a nightmare not just for Meta but for the entire industry.
Settlement dynamics. The longer this drags on, the more attractive a structured settlement becomes—especially if other AI copyright cases start producing mixed, expensive outcomes. A deal that combines money, partial licences and some form of opt‑out mechanism would fit neatly with where Europe is already heading.

My bet: the Cox ruling will help Meta narrow the scope of contributory liability, but it will not give it the clean dismissal it hopes for. Judges are unlikely to equate a company that deliberately seeds terabytes on BitTorrent with a passive ISP. Even a partial win for plaintiffs would send a strong signal: AI labs cannot treat grey‑market data pipelines as a free lunch forever.

The bottom line

Meta’s attempt to hide a very active role in torrenting behind a legal standard designed for passive ISPs is a high‑stakes gamble. Even if the Supreme Court’s Cox decision trims the edges of contributory liability, it does not magically convert deliberate seeding into neutral infrastructure. For AI builders, the message is clear: the era of "grab everything you can from wherever you can" is closing. The open question is whether the industry will pivot to transparent, licensed training—or wait for courts and regulators, in the US and Europe, to force the issue.

Meta, BitTorrent and the AI Training Wars: Why One Lawsuit Suddenly Got Dangerous

Meta, BitTorrent and the AI Training Wars: Why One Lawsuit Suddenly Got Dangerous

The news in brief

Why this matters

The bigger picture

The European angle

Looking ahead

The bottom line

Comments

Leave a Comment

Related Articles

Sam Altman and the AI Trust Crash: When Tech Utopians Run Critical Infrastructure

Eclipse’s $1.3B ‘physical AI’ bet shows where venture money is moving next

Arcee’s Trinity bet: Can a 26‑person shop outmaneuver AI giants with real openness?

Stay Updated