Particle News: Talkie, a 13B ‘Vintage’ AI Trained on Pre-1931 Texts, Launches for Public Use

Overview

Talkie-1930-13b is a 13-billion-parameter model trained on about 260 billion tokens from books, newspapers, journals, patents, and case law published before 1931.
The release is public with downloads on GitHub and Hugging Face and a web chat, with the team warning that outputs may reflect outdated and offensive views.
Benchmarks show it trails an identically sized model trained on modern web data, though it keeps pace on basic language understanding and numeracy.
Tests point to noisy optical character recognition as a main bottleneck, with models on conventional OCR reaching roughly 30% of human-transcribed performance and simple cleaning lifting that to about 70%.
The team reports leakage of post-1930 facts and is building better filters, a vintage post-training pipeline, and a larger multilingual corpus, with a GPT-3-level vintage model targeted for release this summer.