Particle.news
Download on the App Store

Talkie, a 13B ‘Vintage’ AI Trained on Pre-1931 Texts, Launches for Public Use

The model probes how a pre-1931 training diet shapes AI behavior.

Overview

  • Talkie-1930-13b is a 13-billion-parameter model trained on about 260 billion tokens from books, newspapers, journals, patents, and case law published before 1931.
  • The release is public with downloads on GitHub and Hugging Face and a web chat, with the team warning that outputs may reflect outdated and offensive views.
  • Benchmarks show it trails an identically sized model trained on modern web data, though it keeps pace on basic language understanding and numeracy.
  • Tests point to noisy optical character recognition as a main bottleneck, with models on conventional OCR reaching roughly 30% of human-transcribed performance and simple cleaning lifting that to about 70%.
  • The team reports leakage of post-1930 facts and is building better filters, a vintage post-training pipeline, and a larger multilingual corpus, with a GPT-3-level vintage model targeted for release this summer.