Overview
- Stanford researchers report that large language models changed their comments on identical student essays when only the described writer’s identity changed.
- The team processed 600 eighth-grade persuasive essays through four AI systems, including versions of ChatGPT and Meta’s Llama, then re-ran each essay with rotating labels for race, gender, motivation, and disability.
- Essays tagged as Black drew more praise and encouragement, Hispanic or English-learner labels drew more grammar correction, and White labels drew more critique on argument, evidence, and clarity.
- The authors describe these patterns as positive-feedback and feedback-withholding biases and caution that softer treatment for some students can limit meaningful revision and skill growth.
- They urge teachers to review AI comments before sharing them and note that the paper is not peer-reviewed but nominated for presentation at an international learning analytics conference, with causes of the patterns uncertain due to opaque model training.