Overview
- Nanochat is a full‑stack, minimal codebase that covers tokenization, pretraining, midtraining, supervised finetuning, evaluation, inference, and a simple web UI.
- A single speedrun script trains and serves a usable model in roughly four hours for about $100 at ~$24 per hour, with a report.md summarizing modest evaluation results.
- The default regimen pretrains on ~24GB from fineweb‑edu‑100b‑shuffle, midtrains on SmolTalk, MMLU auxiliary, and GSM8K, then finetunes on ARC‑Easy, ARC‑Challenge, GSM8K, and SmolTalk.
- A third‑party build of the model is now on Hugging Face at sdobson/nanochat, and community experiments show it running on CPU on macOS using a lightweight script.
- Karpathy outlines larger, higher‑cost tiers—including a ~$300 depth‑26 run and a ~$1,000 tier—that are described but not yet merged into the main branch.