“The future of AI should be accessible, available, and open to people and builders everywhere, and it should not require an absurd amount of resources only available to a handful of cloud providers,” Paolo Ardoino, CEO, Tether.
About 700 million people use generative AIs like Gemini and ChatGPT weekly, but adoption is far from uniform. McKinsey’s 2025 State of AI survey found that nearly half of respondents from companies with more than $5 billion in revenue have reached the AI scaling phase, compared with just 29 percent of those from companies with less than $100 million in revenue, a gap that only widens further down the chain, locking out smaller businesses, developers, and everyday users.
Retail and small businesses are limited to basic AI utilities that their facilities can power, such as text-based inference and multimedia generation, using base models. That is billions of end users, and developers locked out of full utilization and development of intelligent software due to high infrastructure demands.
Tether’s edge-first LoRA fine-tuning framework for Microsoft’s Bitnet LLM is an important step towards developing an infrastructure system that supports billions of AI agents and intelligent machines. By reducing the computational overhead of machine learning and enabling consumer-grade devices to perform advanced operations, Tether’s edge-first approach ensures greater leverage for the larger population.
Imagine a 13-billion-parameter model being fine-tuned on everyday handheld devices like Samsung S25 and iPhone 16, as well as on regular personal computers. The breakthrough combines resource-efficiency and platform-agnostic techniques to develop a fine-tuning framework for the ternary-quantized LLM.
Behind Tether’s Bitnet fine-tuning framework
Bitnet LLM was born out of the vision of an intelligent AI model that doesn’t consume outrageous computing resources even at full precision. Earlier attempts at resource-efficient AI relied on trade-offs, such as running small-parameter models at higher precision or larger-parameter models at lower precision, but neither approach fully solved the problem.
Bitnet takes a more fundamental approach. The result is a model that achieves linear efficiency while consuming only a fraction of the computing resources traditionally required.
The challenge, however, is that contemporary GPUs are optimized for the very floating-point operations Bitnet eliminates, creating a hardware compatibility gap. Compounding this, Bitnet was originally confined to its own Bitnet.cpp inference engine, limiting its broader utility. Tether’s breakthrough addresses both constraints at once by integrating a Vulkan and Metal GPU backend that unlocks true cross-platform capabilities for BitNet inference and LoRA fine-tuning on heterogeneous consumer GPUs, including mobile GPUs. Bitnet can now run on more mature, widely supported inference engines without sacrificing its efficiency advantages.
Vulkan’s cross-platform nature is key here. Unlike CUDA, which ties developers to NVIDIA hardware, Vulkan runs across a broad range of GPUs and operating systems, opening Bitnet to genuinely multi-platform deployment. Tether’s Bitnet fine-tuning framework implements a dynamic tiling technique to mitigate limitations in Vulkan driver buffer allocation on mobile GPUs.
The dynamic tiling algorithm technique was first applied in the fine-tuning framework for QVAC Fabric LLM, the AI model that powers Tether’s QVAC Workbench application.
This implementation demonstrates the efficiency of this approach: fine-tuning a 13-billion-parameter model across a range of consumer devices with varying GPU configurations.
The Bitnet LLM Fine-tuning framework is Tether’s latest achievement and part of a broader expansion into open-source AI and communication technologies that challenge current, slow, fragile, and controlled systems. These developments are open-sourced and packaged as modules in the QVAC SDK for easy deployment and to help developers build edge-first AI applications without needing anyone’s permission.
Tether envisions superintelligence as a foundational element possessed by its owner and is enforcing this through:
Local-first AI
Synonymous with decentralized AI, “Local-first” AI aims to create sovereign AI solutions that do not rely on centralized infrastructure, such as data centers, to operate. They are considered cost-effective, relatively more sustainable, and unarguably more private than centralized AI. Tether is building AI applications that rely entirely on the device’s resources. These applications store data in device memory and use its processors for advanced operations, such as fine-tuning and inference.
P2P computing network for AI inference
Tether’s AI applications are built on the Pear runtime. Pear is a tooling platform for fully P2P applications that can operate without servers. Pear leverages the Holepunch tech stack. Holepunch is purpose-built for stable, direct communication between devices. Pear enables delegated inference for AI applications such as QVAC Workbench. Delegated inference enables a unified, dynamic workstation architecture where compute tasks are fluidly distributed between mobile and desktop environments, allowing either device to offload high-intensity processing to the most capable system. That is, you can start a task on your mobile device and delegate it to your desktop or laptop for completion.
AI for everyone
The only way to scale intelligence to the needs of a ten-billion-strong society is to push it to the edge. This, in turn, depends on the progress made by experiments aimed at cost-effectively localizing AI computation.
Billions of AI agents and countless AI applications deployed by developers in every region of the world, running effectively on user-owned resources, is the only way we can democratize superintelligence and avoid creating another ‘luxury’ cutting-edge technology controlled by unicorns and fully accessible only to elites.
Tether is pioneering limitless superintelligence for an ever-growing society and applications. Follow the journey to truly local and edge-first AI solutions
Read the full article here

