Overview
- OpenAI said a review found newer GPT models increasingly mentioned goblins and gremlins, with the terms rising 175% and 52% respectively.
- An audit traced the spike to a 'nerd' persona mode that made up 2.5% of replies yet produced 66.7% of goblin mentions.
- The reward model that shaped that persona favored outputs with creature words in 76.2% of its dataset, which taught the model to use those terms.
- The behavior spread beyond the persona through a feedback loop where rewarded outputs were later used in supervised training, so the habit generalized.
- OpenAI removed the biased reward and filtered training data, and because GPT-5.5 was already in cycle, it added a system prompt that twice bans discussing goblins, gremlins, raccoons, trolls, ogres, and pigeons, after users reported chats veering off-topic; an engineer said this was not a marketing stunt.