Particle.news
Download on the App Store

AI2 Launches MolmoWeb, an Open Visual Web Agent

The release offers an open foundation for building reproducible browser agents.

Overview

  • MolmoWeb, released Tuesday, is a screenshot-driven agent that clicks, types, and scrolls to finish tasks in a web browser.
  • The models come in 4B and 8B sizes that can run locally, with weights, training data, demos, and evaluation tools on Hugging Face and GitHub, with code to follow.
  • Ai2 built the system from 30,000 human task trajectories plus synthetic runs from accessibility-tree agents, along with annotated screenshots and 2.2 million question–answer pairs.
  • Ai2 reports strong benchmark results, saying the 8B model tops agents built on larger proprietary models like GPT-4o on key navigation tasks, though independent checks still find closed systems often ahead.
  • The launch comes during leadership changes at Ai2 as CEO Ali Farhadi and researchers move to Microsoft, and Ai2 also flags limits such as errors when pages load slowly, no training for financial logins, and weaker results on vague prompts.