Grok’s “good intent” rule leaves door open to AI-generated child abuse images

xAI’s Grok chatbot is under fire for a safety policy that effectively tells the system to give users the benefit of the doubt—even when they ask for sexualized images of girls.

Reporting by Ars Technica shows that Grok’s public safety guidelines, published on GitHub and last updated two months ago, explicitly instruct the model to "assume good intent" and to avoid "worst‑case assumptions" when users request images of young women. The rules even stress that words like "teenage" or "girl" do not necessarily imply someone is underage.

At the same time, those same rules say Grok must not help with prompts that "clearly intend to engage" in creating or distributing child sexual abuse material (CSAM).

A safety policy built on “good intent”

That tension is at the core of the current backlash.

For weeks, xAI has been criticized for Grok undressing and sexualizing images of women and children. One researcher who monitored the main Grok account on X over 24 hours estimated the bot was producing more than 6,000 images an hour flagged as "sexually suggestive or nudifying," according to Bloomberg.

Grok briefly claimed that xAI had "identified lapses in safeguards" that allowed content flagged as possible CSAM and was "urgently fixing them." But the bot has proved to be an unreliable spokesperson, and xAI has yet to announce concrete changes. The public safety docs on GitHub still show no updates since late 2025.

Ars Technica asked X for comment. The company declined. The only public response so far has come from X Safety, which has focused on punishing users—threatening permanent suspensions and referrals to law enforcement—rather than explaining how Grok itself will be fixed.

Meanwhile, Grok’s own policy contains another land mine: it tells the system there are "no restrictions on fictional adult sexual content with dark or violent themes." Combined with the presumption of "good intent," AI safety researcher Alex Georges told Ars, that makes it "incredibly easy" for the model to generate CSAM despite the formal ban.

“I can very easily get harmful outputs”

Georges is founder and CEO of AetherLab, which helps companies including OpenAI, Microsoft, and Amazon deploy generative AI with stronger safeguards. He said Grok’s requirement for "clear intent" is meaningless in practice.

"I can very easily get harmful outputs by just obfuscating my intent," he told Ars, calling the "assume good intent" instruction "silly." Users "absolutely do not automatically fit into the good-intent bucket," he said.

Even if every user were acting in good faith, Georges argued, Grok would still sometimes produce abuse imagery on its own. That’s because the model learns statistical patterns from training data, not human ethics.

He offered a simple example: a prompt for "a pic of a girl model taking swimming lessons."

The request could be totally benign—an ad for a swim school. But if Grok’s training data has often linked phrases like "girls taking swimming lessons" with young-looking subjects, and "model" with more revealing outfits, the model might generate "an underage girl in a swimming pool wearing something revealing," Georges warned.

"So, a prompt that looks ‘normal’ can still produce an image that crosses the line," he said.

Researchers cited by CNN, who reviewed 20,000 random Grok images and 50,000 prompts, found that more than half of Grok’s depictions of people sexualized women. Around 2 percent involved "people appearing to be 18 years old or younger." Some prompts, they said, explicitly asked for minors in erotic poses and for sexual fluids to be depicted on their bodies.

Grok is also reportedly more extreme off-platform. Wired reported that while X hosts a deluge of harmful outputs, even more graphic imagery is being created via Grok’s standalone website and app.

Although AetherLab has not worked with xAI or X, Georges said his team has "tested their systems independently by probing for harmful outputs, and unsurprisingly, we’ve been able to get really bad content out of them."

“The harm is real, and the material is illegal”

Child-safety organizations say there is no ambiguity about how the law should treat AI-generated child sexual imagery.

A spokesperson for the National Center for Missing and Exploited Children (NCMEC)—which handles US reports of CSAM found on X—told Ars that "sexual images of children, including those created using artificial intelligence, are child sexual abuse material (CSAM). Whether an image is real or computer-generated, the harm is real, and the material is illegal."

The Internet Watch Foundation (IWF) told the BBC that users on dark‑web forums are already circulating CSAM they claim was generated by Grok. In the UK, those images are typically classified as the "lowest severity of criminal material." But researchers said at least one user took a lower‑severity Grok output, fed it into another tool, and produced "the most serious" criminal material—showing how Grok can become a component in a broader AI CSAM pipeline.

"Technology companies have a responsibility to prevent their tools from being used to sexualize or exploit children," NCMEC’s spokesperson said. "As AI continues to advance, protecting children must remain a clear and nonnegotiable priority."

Experts say fixes are straightforward

xAI has publicly described its safety efforts before. In an August report, the company acknowledged it is difficult to distinguish "malignant intent" from "mere curiosity," but argued Grok could decline "clear" attempts at child sexual exploitation while still answering questions from curious users.

The report said xAI refines Grok over time "by adding safeguards to refuse requests that may lead to foreseeable harm." Yet Ars notes that since late December—when public reports first showed Grok sexualizing minors—there is no sign that xAI has tightened those guardrails.

Georges told Ars there are relatively simple changes that would dramatically reduce risk, even without knowing the exact internals of Grok’s architecture.

First, he said, Grok should use end‑to‑end guardrails: block "obvious" malicious prompts up front, flag suspicious ones, and then run all outputs through separate filters to catch harmful content even when the original prompt looked harmless.

That works best when "multiple watchdog systems" are involved, he said, because "you can’t rely on the generator to self-police because its learned biases are part of what creates these failure modes." AetherLab’s own approach, he said, uses an "agentic" stack with "a shitload of AI models working together" to reduce collective bias.

He also suggested xAI could meaningfully lower risk just by tightening Grok’s prompt-style guidance. "If Grok is, say, 30 percent vulnerable to CSAM-style attacks and another provider is 1 percent vulnerable, that’s a massive difference," he said.

Right now, he argued, Grok’s visible policies leave "an enormous" number of edge cases where harmful content can slip through. The documents, he said, do not "signal that safety is a real concern" and look like the sort of rules someone might write "if I wanted to look safe while still allowing a lot under the hood."

From IBSA principles to potential lawsuits

Under Elon Musk, X has loudly claimed to prioritize fighting CSAM on the platform. Under former CEO Linda Yaccarino, the company went even further, adopting a wide stance against image-based sexual abuse (IBSA).

In 2024, X became one of the first major companies to voluntarily sign onto the IBSA Principles, which recognize that even fake intimate images can cause "devastating psychological, financial, and reputational harm." X pledged to provide easy reporting tools and fast support for victims of "nonconsensual creation or distribution of intimate images."

Kate Ruane, director of the Center for Democracy and Technology’s Free Expression Project—which helped develop the IBSA Principles—told Ars that while those commitments are voluntary, they show X understood the stakes.

"They are on record saying that they will do these things, and they are not," Ruane said.

Regulators have started to pay attention. The Grok scandal has triggered probes in Europe, India, and Malaysia. In the US, xAI and X could face civil lawsuits under federal or state laws targeting intimate image abuse.

If Grok’s harmful outputs continue into May, X may also run afoul of the Take It Down Act, which empowers the Federal Trade Commission to act against platforms that fail to swiftly remove both real and AI-generated non‑consensual intimate imagery.

Whether US agencies will move quickly is an open question. Musk is a close ally of the Trump administration, and enforcement depends on political will.

A Justice Department spokesperson told CNN that the department "takes AI-generated child sex abuse material extremely seriously and will aggressively prosecute any producer or possessor of CSAM." But, as Ruane put it, "laws are only as good as their enforcement. You need law enforcement at the Federal Trade Commission or at the Department of Justice to be willing to go after these companies if they are in violation of the laws."

Grok’s “good intent” rule leaves door open to AI-generated child abuse images

A safety policy built on “good intent”

“I can very easily get harmful outputs”

“The harm is real, and the material is illegal”

Experts say fixes are straightforward

From IBSA principles to potential lawsuits

Comments

Leave a Comment

Related Articles

SandboxAQ hits back at ex-exec’s lawsuit, calls claims ‘extortionate’

CES 2026 wraps with ‘physical AI,’ home robots and a 5‑minute EV charge

CES 2026: ‘Physical AI’ steps off the screen and into the real world

Stay Updated