Particle News: Tether’s QVAC Fabric Brings BitNet LoRA Fine-Tuning to Phones and Consumer GPUs

Overview

Tether released QVAC Fabric with what it calls the first cross-platform BitNet LoRA fine-tuning for on-device language models.
Reported results include fine-tuning a 125M model in about 10 minutes and a 1B model in roughly 1 hour 18 minutes on a Galaxy S25, with runs up to 3.8B on flagship phones and up to 13B on iPhone 16.
The framework broadens support beyond Nvidia to AMD and Intel GPUs, Apple’s Metal stack, and high-end mobile GPUs.
Tether cites up to 90% lower memory use versus full-precision models, VRAM reductions of about 65–78% versus popular baselines, and 2–11× GPU speedups over CPU on phones.
The code is open-sourced on GitHub, with Tether highlighting local data privacy and potential for federated learning, while independent benchmarking and real-world thermal and licensing assessments remain to be seen.