Overview
- The Financial Times says it blocks bots from OpenAI, Anthropic, Perplexity, and the Internet Archive from scraping its paywalled site.
- Because most FT stories are paywalled, Matt Rogerson says typically only unpaywalled FT articles appear in the Wayback Machine.
- The New York Times confirms it is blocking the Internet Archive’s bot, citing unauthorized, unfettered access that could be used by AI companies.
- Industry executives say the Internet Archive’s API offers structured data that makes it an attractive target for AI scrapers, while the Wayback Machine is considered less risky.
- Reddit previously cut off the Internet Archive over API concerns, and shifting bot rules and archival policies are already limiting what the public can find in web archives.