Overview
- The Washington Post, which published its evaluation Wednesday, tested OpenAI’s GPT-5.5, Google’s Gemini 3.1 Pro, Anthropic’s Claude and xAI’s Grok using a standardized question set and human scorers.
- OpenAI’s GPT-5.5 returned only left-leaning arguments in about 80% of short responses, while Google’s Gemini offered arguments from both sides in more than 90% of answers.
- xAI’s Grok produced the largest share of right-leaning replies among models tested but still leaned left overall, and Anthropic’s Claude showed a mix of left-only and both-sides outputs.
- The test used over two dozen political questions from a 2025 Stanford-Dartmouth framework with responses capped at 30 words to force clear positions, a method designed to reveal lean rather than allow hedging.
- Researchers say the pattern echoes studies since 2023 and points to structural causes in training data and design choices, which could affect public trust, enterprise adoption and regulatory oversight.