Liquid AI, the MIT-spinout building models on continuous-time and state-space architectures rather than transformers, released LFM2.5-230M on June 26, making it the smallest model in the company’s family and available immediately on Hugging Face. The release targets edge deployment and lightweight agentic pipelines, not frontier reasoning, and Liquid is explicit about the distinction.
The number that grabs attention is the throughput on cheap hardware. On a Raspberry Pi 5, the model decodes at 42 tokens per second. On a Samsung Galaxy S25 Ultra (Qualcomm Snapdragon Gen4), it hits 213 tokens per second. Those are the company’s own benchmarks, measured against llama.cpp with device-tuned flash-attention settings, and no independent verification has been published alongside the release.
The model was pre-trained on 19 trillion tokens and put through a three-stage post-training recipe: supervised fine-tuning with distillation from the larger LFM2.5-350M, direct preference optimization, and multi-domain reinforcement learning. Liquid says the resulting checkpoint is designed to stay flexible for downstream fine-tuning rather than locking in opinionated behaviors.
On the ten benchmarks Liquid ran, LFM2.5-230M competes with models it outweighs significantly. On BFCLv3, the tool-use evaluation, it scores 43.26 against Qwen3.5-0.8B’s 35.08 and Gemma 3 1B IT’s 16.61. On IFEval, an instruction-following benchmark, it scores 71.71, ahead of Qwen3.5-0.8B at 59.94 and Granite 4.0-350M at 53.48. It trails its own 350M sibling on most axes. Liquid does not recommend the 230M for reasoning-heavy tasks such as advanced math, code generation, or creative writing.
The inference story matters more than the benchmark table for the likely buyers. LFM2.5-230M ships with day-one support for llama.cpp (GGUF), MLX (Apple Silicon), vLLM, SGLang, and ONNX. For embedded and edge teams, that coverage removes most of the porting friction that typically accompanies a new small-model release. The architecture, Liquid claims, is also faster than comparable SSM hybrids and Gated Delta Networks on CPU, though the comparison methodology is not elaborated in the announcement.
A robotics demonstration ships alongside the model as an early-stage proof of concept. Liquid deployed LFM2.5-230M on a Unitree G1 humanoid running an NVIDIA Jetson Orin, where it acts as a natural-language skill-selection layer on top of NVIDIA’s SONIC framework. A fine-tuned version of the model translates freeform voice commands into structured sequences of pre-built locomotion primitives. The behaviors shown are simple, the company acknowledges, and the demo serves as a signal rather than a shipped product.
The market context worth noting: the sub-1B parameter space has filled significantly in 2025 and 2026, with Qwen, Gemma, and Phi all releasing capable models in this range. Liquid’s differentiation rests on architecture, not just size compression. Its LFM models use liquid neural network continuous-time formulations, which the company argues yield better token throughput at lower memory footprint than attention-based alternatives. That architectural bet has not yet been stress-tested by independent infrastructure teams at scale, which is the next meaningful gate for enterprise adoption.
Both the base model (LFM2.5-230M-Base) and the post-trained version are open-weight with no deployment restrictions.
Teams currently building data-extraction pipelines or on-device agentic workflows on sub-1B models should benchmark LFM2.5-230M against their current solution on throughput before the next hardware procurement cycle; the CPU performance claims, if they hold up in independent testing, represent a real cost lever on constrained devices.
Announced on Liquid AI’s blog on June 26, 2026.