Particle News: Stanford AI Agent Beats Most Human Pen Testers in Campus Network Trial

Overview

The Stanford study reports ARTEMIS placed second overall and outperformed nine of ten professional penetration testers in a controlled experiment.
Researchers let the agent operate for 16 hours across the university’s computer science networks of roughly 8,000 devices, with the head‑to‑head limited to its first 10 hours.
Within that window, ARTEMIS logged nine valid findings with an 82% valid submission rate, a result comparable to the strongest human participant.
Operating costs were estimated at about $18 per hour for the base setup and $59 per hour for a stronger variant, versus typical U.S. tester pay near $125,000 annually.
ARTEMIS scales by spawning subagents and even bypassed a balky browser via command line, though it missed GUI‑dependent issues and produced some false positives, and coverage notes growing real‑world misuse of AI by threat actors.