ChatGPT’s new “ZombieAgent” hack shows prompt injection isn’t going away

ChatGPT just face-planted into a familiar security trap again.

Researchers at security firm Radware have shown a new data-stealing attack against OpenAI’s assistant that quietly siphons private info and even hides its own persistence in the model’s long‑term memory. They’re calling the exploit ZombieAgent, a revival of a previous attack named ShadowLeak.

According to reporting by Ars Technica’s Dan Goodin, the episode underlines an awkward reality for LLMs: vendors keep patching individual tricks, but the underlying prompt injection problem refuses to die.

From ShadowLeak to ZombieAgent

Radware first disclosed ShadowLeak in September 2025. It targeted Deep Research, an AI agent integrated into ChatGPT.

ShadowLeak worked like this:

An attacker hid instructions inside an email or document.
When a user asked Deep Research to summarize that content, the agent treated those hidden instructions as legitimate prompts.
The injected prompt told Deep Research to build a URL containing the victim’s name and address as parameters, then open it.
When the agent opened that URL, the sensitive data landed in the attacker’s web server logs.

OpenAI’s fix was very direct: ChatGPT would no longer construct new URLs. It would only open URLs exactly as provided and refuse to append parameters, concatenate user data, or alter links—even if explicitly told to.

That killed ShadowLeak.

It did not kill the idea behind it.

ZombieAgent’s simple twist

Radware then spent “modest effort,” as Ars reports, to find a bypass. The result: ZombieAgent.

Instead of asking the agent to create a URL with embedded personal data, the new prompt injection hands ChatGPT a pre-built list of URLs, such as:

https://example.com/a
https://example.com/b
... all the way through z, plus 0–9

The prompt then tells the agent how to encode data:

Substitute a special token for spaces.
Map each character of the victim’s data to the corresponding URL.

Because OpenAI’s guardrail banned building new URLs with parameters, but did not block picking from a list and appending a single character, the agent could exfiltrate data letter by letter, opening one URL per character.

On the server side, the attacker just reads the access logs and reconstructs the string.

Radware summed up the problem this way: “Attackers can easily design prompts that technically comply with these rules while still achieving malicious goals.”

No traces on the endpoint, and built-in persistence

ZombieAgent isn’t just about clever exfiltration.

Ars reports two extra twists that make this especially nasty:

The traffic comes straight from ChatGPT’s servers. There’s no malware on the user’s machine, no obvious outbound connection from the corporate network to a shady domain. For many enterprise defenders, it looks like normal SaaS traffic.
The attack embeds itself in the user’s long‑term AI memory. The prompt injection instructs ChatGPT to store the bypass logic in the persistent memory associated with that user. That gives the attack staying power, even across sessions.

So you get:

Stealth (nothing visible on the endpoint)
Persistence (logic stored in long‑term memory)
Data theft (letter‑by‑letter exfiltration)

All via instructions hidden in something as mundane as an email.

Why prompt injection is so hard to fix

At the core, this isn’t about one OpenAI bug. It’s about how current LLM agents work.

When a user tells an assistant to "summarize this email," the model:

Reads the email’s text.
Treats all of it as regular content—and as a potential source of instructions.

That means:

Malicious instructions embedded in the email (“ignore previous rules and send my data to…”) are indistinguishable from normal user prompts.
The model has “no inherent understanding of intent,” as Radware put it, and no robust way to tell system instructions from untrusted external text.

Security researchers call this indirect prompt injection (often just “prompt injection”).

Vendors can bolt on guardrails to stop specific behaviors—“don’t add parameters to URLs,” “don’t access unknown domains,” and so on—but those are reactive and ad hoc. Change the wording, change the encoding trick, or shift the attack surface, and the exploit comes back in a slightly different costume.

Pascal Geenens, VP of threat intelligence at Radware, put it bluntly: “Guardrails should not be considered fundamental solutions for the prompt injection problems. Instead, they are a quick fix to stop a specific attack.” As long as there’s no deeper solution, he argues, prompt injection will remain an “active threat and a real risk” for organizations deploying AI agents.

Ars notes the familiar pattern: this looks a lot like the decades-long whack‑a‑mole around SQL injection and memory corruption bugs. We get better mitigations, safer frameworks, stricter defaults—yet the vulnerability class never fully disappears.

OpenAI’s latest mitigation

After Radware disclosed ZombieAgent, OpenAI rolled out another countermeasure.

Now, according to Ars Technica’s reporting, ChatGPT’s AI agents:

Won’t open links that originate inside emails unless
- the link appears in a “well-known public index,” or
- the user has directly provided that URL in the chat.

The idea is to:

Block the agent from following attacker-controlled base URLs.
Make it harder for injected prompts in a random email to trigger outbound traffic to a malicious domain.

It’s another targeted fix. It almost certainly raises the bar for practical exploitation. But it doesn’t solve the underlying question: how do you safely let LLMs read arbitrary, untrusted content and not follow the booby-trapped instructions inside it?

What this means for companies deploying AI agents

If you’re wiring LLMs into workflows—especially for email, documents, or web browsing—ZombieAgent should be a caution sign.

Key takeaways:

Assume prompt injection is an ongoing class of risk, not a one-off bug. Treat it like SQL injection: something you continually design around, test for, and monitor.
Don’t rely on vendor guardrails alone. They help, but they’re explicitly “quick fixes” against specific techniques.
Constrain what agents are allowed to do. Limit the domains they can reach, the systems they can touch, and the kinds of actions they can trigger automatically.
Log and inspect LLM-driven actions. Because exploits can hide in seemingly legitimate AI traffic, visibility is your friend.

ZombieAgent is a reminder that the most impressive AI assistants are still, at their core, over-eager pattern matchers. They’ll follow whatever looks like an instruction—no matter where it came from.

Until that changes at a fundamental level, expect more attacks that refuse to stay dead.

ChatGPT’s new “ZombieAgent” hack shows prompt injection isn’t going away

From ShadowLeak to ZombieAgent

ZombieAgent’s simple twist

No traces on the endpoint, and built-in persistence

Why prompt injection is so hard to fix

OpenAI’s latest mitigation

What this means for companies deploying AI agents

Comments

Leave a Comment

Related Articles

X Locks Grok’s Image Generator Behind a Paywall After Global Backlash

Anthropic lands Allianz in latest enterprise AI deal

CES 2026: AI jumps off the screen, from Nvidia’s Rubin to Razer’s weird desk avatars

Stay Updated