1. Headline & intro
AI has been eating media’s lunch for two years. Now it’s coming for the reference shelf itself. Encyclopedia Britannica and Merriam‑Webster are suing OpenAI, arguing that ChatGPT is not just learning from their work but actively replacing it and misusing their brands along the way. This isn’t just another AI copyright case: it goes to the heart of who will control authoritative knowledge in the age of large language models. In this analysis, we’ll unpack what’s actually at stake, why this lawsuit is more dangerous for OpenAI than it looks, and what it signals for European publishers and AI builders.
2. The news in brief
According to TechCrunch, Encyclopedia Britannica and its subsidiary Merriam‑Webster filed a lawsuit against OpenAI on 16 March 2026 in the United States. The complaint accuses OpenAI of “massive copyright infringement” involving almost 100,000 online Britannica articles, which the publisher says were scraped and used to train OpenAI’s language models without permission.
Britannica also alleges that ChatGPT can output passages that closely or fully reproduce its copyrighted text and that OpenAI uses Britannica content directly in its retrieval‑augmented generation (RAG) workflow, again without a license. Beyond copyright, the suit claims Lanham Act (trademark) violations when ChatGPT fabricates information but attributes it to Britannica, potentially damaging the brand’s reputation for accuracy.
The publisher argues that ChatGPT’s answers compete directly with its websites, reducing traffic and revenue and threatening public access to trustworthy reference material. Britannica has filed a similar lawsuit against Perplexity, and joins a growing list of media and publishers — including The New York Times and Ziff Davis — suing OpenAI over AI training and outputs.
3. Why this matters
This lawsuit matters because it attacks three pillars of the current generative AI business model at once:
- Training on scraped web data without individual licenses
- Output reproduction that can get uncomfortably close to source material
- Brand and trust piggybacking, where hallucinations are dressed up as coming from reputable sources
Most of the previous high‑profile suits (e.g. by news publishers) focused on the first two. Britannica goes further by making trust and reputation the core of its argument. For a reference brand, a hallucination wrongly attributed to it isn’t just embarrassing — it’s an existential risk.
Who benefits? In the short term, publishers and rights‑holders gain leverage. Britannica is not a marginal player: dictionaries and encyclopedias are the backbone of many AI training sets, and courts may treat them differently from general web pages. If Britannica wins anything substantial, it strengthens the negotiating hand of other specialised content providers — from medical databases to legal publishers.
Who loses? AI labs, especially those betting on large, web‑scale models, face higher legal risk and potentially higher data costs. Even if courts ultimately bless training as “fair use” (as one judge has already suggested in another case mentioned by TechCrunch), they may still punish how data was obtained or how outputs are presented. That could mean:
- More paid data deals rather than pure scraping
- Tighter guardrails against verbatim copying
- Stricter rules about attribution and branding inside chat interfaces
In the immediate term, the main implication is uncertainty. Every general‑purpose model that leans on reference content for factual answers is now under a legal cloud. That will slow deployments in sensitive sectors (education, healthcare, law) and give more room to smaller, domain‑specific AI tools that come with clear, paid‑up data rights.
4. The bigger picture
The Britannica suit is part of a broader negotiation by litigation between content industries and AI platforms.
We’ve seen similar battles before. Search engines scraped news sites; publishers sued; years of conflict ended with a mix of new rights (like the EU’s press publishers’ right), tweaks to how snippets are shown, and ultimately, licensing deals. The difference now is that generative AI doesn’t send you back to the source. It tries to be the destination.
Recent developments show the landscape crystallising:
- According to TechCrunch’s reporting, a U.S. judge in a case involving Anthropic suggested that using copyrighted text as training data could be considered transformative and legal — but how those texts were acquired (mass, unauthorized downloading) still triggered a huge settlement for authors.
- Major publishers like The New York Times and Ziff Davis have already sued OpenAI for both training use and output reproduction, arguing that ChatGPT is a direct competitor to their sites.
- A parallel Britannica lawsuit against Perplexity targets not only training but also how AI answers are wrapped in branding and presented as if they were canonical references.
Seen together, the trend is clear: courts may gradually accept that models can learn from copyrighted works, but will scrutinise:
- The data acquisition pipeline (scraping vs. licensed feeds)
- The commercial substitution effect (does the AI replace the original service?)
- The use of brands and citations inside AI answers
Competitively, this accelerates a split between:
- Big, capital‑rich labs (OpenAI, Anthropic, Google, Meta) that can afford licensing deals and compliance teams, and
- Smaller or open‑source players who rely heavily on freely available data and may be pushed into narrower, less risky domains or synthetic‑data loops.
The industry direction is becoming obvious: general‑purpose AI that answers factual questions will have to be built on contracts, not just crawlers.
5. The European / regional angle
For Europe, the Britannica case is a warning shot, even though it is filed under U.S. law.
The EU already has a complex framework around text and data mining (TDM) in the DSM Directive: in principle, AI developers can mine copyrighted works for research and even commercial purposes, but rights‑holders can opt out. On top of that sit EU‑specific layers like database rights, moral rights, and soon the EU AI Act, which will require transparency about training data sources and risk management.
European publishers have not waited. Deals between AI companies and groups like Axel Springer signalled that big media houses prefer licensing to endless litigation — but those deals were cut in the shadow of exactly these kinds of lawsuits. Britannica’s move increases the perceived upside of holding out for better terms.
For European users, there’s a subtler effect. If reference publishers become more aggressive about controlling access to their data, we could see:
- Region‑specific gaps in AI knowledge, where some European languages or domains are under‑represented because licensing is unresolved
- More walled‑garden" reference experiences, where you get premium, verified answers only inside proprietary platforms or AI products with official deals
For EU regulators, Britannica vs. OpenAI is a case study in why transparency and provenance matter. When a chatbot misattributes hallucinations to a trusted source, it’s not just a copyright issue — it becomes a consumer‑protection and misinformation problem, which squarely falls within the scope of the Digital Services Act and emerging AI governance.
6. Looking ahead
Several trajectories are now plausible.
1. Settlement as de‑facto licensing template.
The most likely outcome is not a dramatic court ruling on the nature of AI training, but a confidential settlement. If money and structural commitments change hands — for example, explicit licensing, brand‑safe answer modes, or traffic‑sharing schemes — that deal will quietly become the benchmark for similar negotiations worldwide.
2. Technical guardrails around reference content.
Whether or not OpenAI wins on the law, it has a product incentive to fix misattributed hallucinations. Expect:
- Stricter rules on when ChatGPT may say “According to Britannica…”
- More prominent source links or inline citations
- Aggressive filtering against near‑verbatim reproduction of reference articles
If OpenAI doesn’t implement this voluntarily, courts or regulators almost certainly will.
3. Fragmentation of knowledge ecosystems.
Reference publishers are likely to split into three camps:
- Those who license aggressively to AI labs and embed themselves into chat interfaces
- Those who try to build their own AI layers, using proprietary content as a moat
- Those who lack the scale to negotiate and risk being left out of AI entirely
For developers and enterprises, the practical question becomes: can you prove your model was trained and operates on data you’re actually allowed to use? Compliance around data lineage and IP will become as important as parameter counts.
Key things to watch over the next 12–24 months:
- Whether other specialised reference providers (medical, legal, financial) file similar suits
- How OpenAI changes ChatGPT’s attribution and citation UX
- If EU regulators start testing DSA/AI Act tools against misleading AI attributions
- Whether courts continue the trend suggested in the Anthropic case: tolerant on training, strict on acquisition and outputs
7. The bottom line
The Britannica–Merriam‑Webster lawsuit turns a diffuse copyright debate into a sharp question: can AI companies turn the world’s reference works into an answer engine without paying or being held to their standards of accuracy? My view: the era of training on everything you can crawl is ending. AI that claims to be authoritative will have to be built on explicit deals with the authorities of knowledge — and will be judged by how honestly it represents them. The open question is whether that future will be open, or locked inside a few corporate silos.



