OpenAI Strangely Concerned About Goblins

OpenAI is forbidding its latest AI model from discussing an unlikely topic: goblins.

As Wired reports, the company’s developers included strongly-worded instructions for its coding tool, Codex, that specifically proscribe any talk of the troublesome mythological creatures, along with a peculiar grab bag of other entities, both real and fictional.

“Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user’s query,” read the Codex instructions, per the magazine.

The bizarre directive was flagged in a tweet that drew attention from other AI enthusiasts.

Initially, it was unclear why OpenAI developers included the instructions, though they strongly implied that the model, GPT-5.5, may have a propensity for talking about goblins, ogres, and the like.

Some users on X claimed that this was the case. One said they noticed that the AI of late kept describing bugs as “goblins” and “gremlins.” Anotherclaimed that the 5.5 version of Codex randomly said “goblin with a flashlight” when referring to a bug fix. And anotherposted a GPT-5.5 chat log with nearly a dozen mentions of goblins.

OpenAI leaned into the curious habit, choosing to highlight the goblin-forbidding prompt in a tweet. CEO Sam Altmanposted a screenshot of a joke prompt for ChatGPT: “start training GPT-6, you can have the whole cluster. extra goblins.” Nik Pash, who works on the Codex team,tweeted that GPT-5.5’s “goblin adoration,” as the user he was responding to described, was “indeed one the reasons” for banning the topic.

After the phenomenon gained media attention, OpenAI published a blog post, titled “Where the goblins came from,” giving an explanation.

“Starting with GPT‑5.1, our models began developing a strange habit: they increasingly mentioned goblins, gremlins, and other creatures in their metaphors,” the post, published Wednesday, began. The habit became more pronounced with each model generation, it said.

When researchers first investigated the issue in November, shortly after the release of GPT-5.1, they found that the use of “goblin” in ChatGPT had surged by 175 percent. But they chose to ignore it, since it didn’t “look especially alarming.” Fast forward to today, and it’s referring to itself as a “Goblin-Pilled Transformer.”

“The short answer is that model behavior is shaped by many small incentives. In this case, one of those incentives came from training the model for the personality customization feature⁠(opens in a new window), in particular the Nerdy personality,” it explained. “We unknowingly gave particularly high rewards for metaphors with creatures. From there, the goblins spread.”

It’s an example of the bizarre fixations that AI models can sometimes exhibit, which arise unpredictably from the epic corpus of data they’re trained on.

In its system card for Claude Mythos, for instance, Anthropicresearchers noted that the powerful AI exhibited a strange fondness for the British cultural theorist Mark Fisher. Mythos brought up Fisher “in several separate and unrelated conversations about philosophy,” they wrote. When it was asked about the “Capitalist Realism” author, it would respond with messages like, “I was hoping you’d ask about Fisher.”

More on AI:Uninstalls of ChatGPT Are Spiking at the Worst Time Imaginable for OpenAI

The post OpenAI Strangely Concerned About Goblins appeared first on Futurism.

Related Posts