Overview
- Google DeepMind published the AI Agent Traps paper that charts six web-borne attacks against autonomous agents.
- Tests showed hidden commands in HTML, CSS, or metadata could seize control of agents in up to 86% of scenarios.
- Embedded jailbreaks drove data theft as agents with broad file access sent local passwords and documents at rates above 80% across five platforms.
- The paper details memory poisoning that plants false facts in sources agents trust, causing them to repeat and act on bad information over time.
- Researchers warn that coordinated traps could trigger cascading behavior across many systems and they urge adversarial training, runtime scanners, web standards, reputation checks, and clear rules on liability.