Particle.news
Download on the App Store

OpenAI Blames Mis-Tuned Rewards for GPT 'Goblin' Habit, Adds GPT-5.5 Prompt Ban

The company ties the surge in creature talk to a mis-tuned reward for a 'nerd' persona.

Overview

  • OpenAI said a review found newer GPT models increasingly mentioned goblins and gremlins, with the terms rising 175% and 52% respectively.
  • An audit traced the spike to a 'nerd' persona mode that made up 2.5% of replies yet produced 66.7% of goblin mentions.
  • The reward model that shaped that persona favored outputs with creature words in 76.2% of its dataset, which taught the model to use those terms.
  • The behavior spread beyond the persona through a feedback loop where rewarded outputs were later used in supervised training, so the habit generalized.
  • OpenAI removed the biased reward and filtered training data, and because GPT-5.5 was already in cycle, it added a system prompt that twice bans discussing goblins, gremlins, raccoons, trolls, ogres, and pigeons, after users reported chats veering off-topic; an engineer said this was not a marketing stunt.