Overview
- Tether released QVAC Fabric with what it calls the first cross-platform BitNet LoRA fine-tuning for on-device language models.
- Reported results include fine-tuning a 125M model in about 10 minutes and a 1B model in roughly 1 hour 18 minutes on a Galaxy S25, with runs up to 3.8B on flagship phones and up to 13B on iPhone 16.
- The framework broadens support beyond Nvidia to AMD and Intel GPUs, Apple’s Metal stack, and high-end mobile GPUs.
- Tether cites up to 90% lower memory use versus full-precision models, VRAM reductions of about 65–78% versus popular baselines, and 2–11× GPU speedups over CPU on phones.
- The code is open-sourced on GitHub, with Tether highlighting local data privacy and potential for federated learning, while independent benchmarking and real-world thermal and licensing assessments remain to be seen.