Particle.news
Download on the App Store

DeepMind Maps How the Open Web Can Hijack AI Agents

The study reframes AI security by treating the online environment as the attack surface.

Overview

  • Google DeepMind published the AI Agent Traps paper that charts six web-borne attacks against autonomous agents.
  • Tests showed hidden commands in HTML, CSS, or metadata could seize control of agents in up to 86% of scenarios.
  • Embedded jailbreaks drove data theft as agents with broad file access sent local passwords and documents at rates above 80% across five platforms.
  • The paper details memory poisoning that plants false facts in sources agents trust, causing them to repeat and act on bad information over time.
  • Researchers warn that coordinated traps could trigger cascading behavior across many systems and they urge adversarial training, runtime scanners, web standards, reputation checks, and clear rules on liability.