When Does Training Become Theft? The US–China AI Fight Over “Distillation”

Headline & intro

Washington has just fired the loudest shot yet in a new kind of tech cold war: not over chips or data, but over model outputs. By framing “industrial‑scale” AI distillation by Chinese firms as potential espionage, the US is trying to redraw the legal line between inspiration and theft in AI. That line will decide who can cheaply copy whom, how fast China can catch up, and how open the global AI stack remains. In this piece, we unpack what is actually happening, why it matters far beyond Washington and Beijing, and why Europe cannot afford to sit this one out.

The news in brief

According to Ars Technica, citing a memo seen by the Financial Times, the White House Office of Science and Technology Policy claims it has evidence that foreign entities, mainly in China, are running large, coordinated campaigns to “distill” US frontier AI models.

US companies including OpenAI, Google and Anthropic have already complained about this pattern. They say China‑based actors, among others, created tens of thousands of fake or proxy accounts, bombarded their chatbots with prompts millions of times, and used jailbreak tricks to push models into revealing more than they should. Those outputs are then allegedly used to train cheaper clone models.

The House Select Committee on China has urged Congress to treat such “model extraction” as industrial espionage. Lawmakers want agencies like the Commerce Department and Department of Justice to explore using the Economic Espionage Act and the Computer Fraud and Abuse Act, and to classify “adversarial distillation” as a controlled technology transfer—potentially enabling sanctions and prosecutions.

China rejects the accusations as groundless, calling them “slander” and insisting it respects intellectual property and supports fair competition. The dispute lands just weeks before a high‑profile meeting between Donald Trump and Xi Jinping.

Why this matters

At the core is a deceptively simple question: if you pay for API access to a model, or even just interact with a public chatbot, how far can you go in using its outputs to build your own competitor?

From the US firms’ perspective, adversarial distillation is the nightmare version of “free‑riding”. They spend billions on chips, data, talent, and safety work. A foreign rival can then generate huge numbers of interactions—often in violation of terms of service—and use those outputs as labelled training data to approximate the original model’s behaviour. If that’s allowed, the business case for closed, proprietary models weakens dramatically.

For Chinese companies, especially those constrained by access to cutting‑edge chips, distillation is an obvious shortcut. You exploit the fact that inference on US models is cheaper and more politically feasible than importing vast quantities of advanced hardware. Instead of building a frontier‑scale model from scratch, you compress someone else’s. Unsurprisingly, US national‑security circles see that as a way to leapfrog export controls.

Framing this as “industrial espionage” has three immediate effects:

Legal escalation – It moves the issue from contract law (terms of service) into criminal law and export‑control territory. That means potential prosecutions, sanctions, and diplomatic bargaining chips.
Norm‑setting – Once the US labels adversarial distillation as theft, other countries and international bodies will be pushed to take a position. That will define acceptable behaviour for AI labs globally.
Market consolidation – Smaller labs that rely on distillation for model alignment or efficiency—often in legitimate ways—could be caught in the crossfire if regulators draw the line too broadly.

The losers are not just Chinese firms: open research communities and mid‑tier players risk being squeezed between national‑security driven restrictions from Washington and retaliatory measures from Beijing.

The bigger picture

This fight doesn’t come out of nowhere. It’s the convergence of three trends we’ve seen building for years:

From data wars to model wars – The first AI governance battles focused on training data (web scraping, copyright, privacy). Now the asset to protect is the model itself and its emergent behaviour. Model extraction attacks have been discussed in security papers for a decade. What’s new is the scale, commercial intent, and geopolitical framing.
Chokepoint geopolitics – Chips, cloud access, and now model APIs are all leverage points. The US has already used export controls to limit China’s access to advanced semiconductors. If API‑level access and model behaviour are also classified as a controlled technology transfer, we move closer to a world where high‑end AI is treated more like nuclear tech than software.
The closed vs open‑source fault line – Some open‑source advocates have long argued that proprietary labs overstate security risks to justify keeping their systems closed. Ironically, adversarial distillation proves both sides right: closed models are indeed harder to defend at the boundary, and capable of being copied; but heavy‑handed legal responses might further entrench a few giants while making genuine openness even riskier and rarer.

Compared with previous IP fights—say, around semiconductor designs or telecom standards—model distillation is murkier. You’re not stealing source code or weights directly. You’re learning from behaviour that is willingly exposed through an interface. That makes the analogy to traditional espionage imperfect, but it’s powerful in political messaging.

Competitors like the EU or India will watch closely. If the US successfully redefines model extraction as espionage, it sets a template others can reuse—perhaps to shield their own champions, or to push foreign firms to localise and disclose more.

The European angle

For Europe, this is not a distant US–China squabble; it’s a pressure test of the EU’s own digital sovereignty ambitions.

European startups and enterprises rely heavily on US frontier models via APIs. If Washington classifies certain forms of distillation as controlled tech transfer, European firms interacting with Chinese partners could suddenly find themselves in a compliance minefield: is building a bilingual assistant that calls a US API and feeds outputs into a Chinese downstream model still safe?

The EU AI Act, GDPR, the Digital Services Act and forthcoming data‑sharing rules were not written with adversarial distillation in mind. Yet existing tools—the Trade Secrets Directive, IP law and cybercrime statutes—could easily be interpreted to cover large‑scale scraping of API outputs in breach of terms. If the US and China race to criminalise or normalise distillation, Brussels will be pushed to clarify its own stance.

There is also a competitive angle. Europe has few frontier‑scale models, but a strong applied‑AI and open‑source community. If “behavioural cloning” of proprietary models is criminalised too broadly, European SMEs that currently use distillation for benign purposes—compression, domain adaptation, safety fine‑tuning—may face disproportionate legal risk without enjoying the upside of having crown‑jewel models to protect.

On the other hand, the EU could exploit the moment to position itself as a rules‑based middle ground: protecting trade secrets and security while carving out explicit exceptions for interoperability, research and small‑scale reverse engineering. That would align with Europe’s broader strategy of being the place where global tech companies come to get clarity, not just market share.

Looking ahead

Several fault lines to watch over the next 12–24 months:

How narrowly “industrial‑scale” is defined – The most important question is not whether some forms of distillation are outlawed, but which. Regulators could target only large, deceptive, state‑linked campaigns, or they could cast a much wider net that chills routine model‑on‑model training.
Technical countermeasures – US labs will invest heavily in rate‑limiting, behavioural watermarking, anomaly detection and honey‑potting suspicious accounts. That will raise costs for both attackers and legitimate high‑volume users. Expect a familiar arms race: better detection systems, more sophisticated evasion, and growing collateral damage to normal API customers.
Retaliation and fragmentation – If Washington proceeds with sanctions or prosecutions, Beijing is unlikely to accept the narrative of one‑sided theft. Retaliatory measures could hit US cloud providers in China, collaborations with Chinese universities, and even European firms perceived as siding too clearly with Washington. The result could be a more fragmented global AI ecosystem, with separate model lineages and fewer cross‑border benchmarks.
Court battles and precedent – At some point, a concrete case will land in court, forcing judges to answer questions like: Is automated interaction with a public API inherently “unauthorised”? Are model behaviours protectable trade secrets? Early rulings in the US or EU will shape corporate risk calculators worldwide.

For practitioners and investors, the strategic question is simple: do you build assuming a world where model behaviours are heavily protected, or one where “behavioural interoperability” remains mostly fair game? Hedging that bet may be the most important governance decision an AI company makes this decade.

The bottom line

By labelling large‑scale AI distillation as potential espionage, the US is trying to freeze China’s progress and protect its own AI champions—but it is also redefining what counts as theft in a world of black‑box models. If Europe and others simply import Washington’s framing, they risk sacrificing openness and competition without gaining much sovereignty. The next phase of AI governance will be decided less by lofty ethical principles and more by how we answer one hard question: when does learning from a system’s behaviour cross the line into stealing its soul?

When Does Training Become Theft? The US–China AI Fight Over “Distillation”

Headline & intro

The news in brief

Why this matters

The bigger picture

The European angle

Looking ahead

The bottom line

Comments

Leave a Comment

Related Articles

Why a SpaceX Backer’s New $700M War Chest Signals the Next VC Supercycle

Parallel Web’s $2B bet: Building the plumbing for the agent-powered web

AI Meets Red Tape: Why Pursuit’s Government-Sales Bet Is Bigger Than One Startup

Stay Updated