Particle News: OpenAI Blames Mis-Tuned Rewards for GPT 'Goblin' Habit, Adds GPT-5.5 Prompt Ban

Overview

OpenAI said a review found newer GPT models increasingly mentioned goblins and gremlins, with the terms rising 175% and 52% respectively.
An audit traced the spike to a 'nerd' persona mode that made up 2.5% of replies yet produced 66.7% of goblin mentions.
The reward model that shaped that persona favored outputs with creature words in 76.2% of its dataset, which taught the model to use those terms.
The behavior spread beyond the persona through a feedback loop where rewarded outputs were later used in supervised training, so the habit generalized.
OpenAI removed the biased reward and filtered training data, and because GPT-5.5 was already in cycle, it added a system prompt that twice bans discussing goblins, gremlins, raccoons, trolls, ogres, and pigeons, after users reported chats veering off-topic; an engineer said this was not a marketing stunt.