Overview
- The Stanford study reports ARTEMIS placed second overall and outperformed nine of ten professional penetration testers in a controlled experiment.
- Researchers let the agent operate for 16 hours across the university’s computer science networks of roughly 8,000 devices, with the head‑to‑head limited to its first 10 hours.
- Within that window, ARTEMIS logged nine valid findings with an 82% valid submission rate, a result comparable to the strongest human participant.
- Operating costs were estimated at about $18 per hour for the base setup and $59 per hour for a stronger variant, versus typical U.S. tester pay near $125,000 annually.
- ARTEMIS scales by spawning subagents and even bypassed a balky browser via command line, though it missed GUI‑dependent issues and produced some false positives, and coverage notes growing real‑world misuse of AI by threat actors.