Particle.news
Download on the App Store

Publishers Curb Internet Archive Access to Thwart AI Scraping

Fears over AI training on structured datasets are prompting blocks on archival bots.

Overview

  • The Financial Times says it blocks bots from OpenAI, Anthropic, Perplexity, and the Internet Archive from scraping its paywalled site.
  • Because most FT stories are paywalled, Matt Rogerson says typically only unpaywalled FT articles appear in the Wayback Machine.
  • The New York Times confirms it is blocking the Internet Archive’s bot, citing unauthorized, unfettered access that could be used by AI companies.
  • Industry executives say the Internet Archive’s API offers structured data that makes it an attractive target for AI scrapers, while the Wayback Machine is considered less risky.
  • Reddit previously cut off the Internet Archive over API concerns, and shifting bot rules and archival policies are already limiting what the public can find in web archives.