Particle.news
Download on the App Store

Tether’s QVAC Fabric Brings BitNet LoRA Fine-Tuning to Phones and Consumer GPUs

Tether frames the open-source tool as a step toward privacy-first, on-device AI.

Overview

  • Tether released QVAC Fabric with what it calls the first cross-platform BitNet LoRA fine-tuning for on-device language models.
  • Reported results include fine-tuning a 125M model in about 10 minutes and a 1B model in roughly 1 hour 18 minutes on a Galaxy S25, with runs up to 3.8B on flagship phones and up to 13B on iPhone 16.
  • The framework broadens support beyond Nvidia to AMD and Intel GPUs, Apple’s Metal stack, and high-end mobile GPUs.
  • Tether cites up to 90% lower memory use versus full-precision models, VRAM reductions of about 65–78% versus popular baselines, and 2–11× GPU speedups over CPU on phones.
  • The code is open-sourced on GitHub, with Tether highlighting local data privacy and potential for federated learning, while independent benchmarking and real-world thermal and licensing assessments remain to be seen.