LiquidAI/LFM2.5-350M

TEXT GENERATIONConcurrency Cost:1Model Size:0.35BQuant:BF16Ctx Length:32kPublished:Mar 31, 2026License:lfm1.0Architecture:Transformer0.3K Cold

LFM2.5-350M by Liquid AI is a 350 million parameter hybrid language model designed for efficient on-device deployment, building on the LFM2 architecture with extended pre-training and reinforcement learning. It offers best-in-class performance for its size, rivaling larger models, and supports fast edge inference across various hardware platforms. This general-purpose instruction-tuned model excels at data extraction, structured outputs, and tool use, making it suitable for resource-constrained environments.

Loading preview...

LFM2.5-350M: On-Device Hybrid Language Model

LFM2.5-350M is a 350 million parameter hybrid model developed by Liquid AI, specifically engineered for on-device deployment and efficient edge inference. It extends the LFM2 architecture through significant pre-training (28T tokens) and large-scale multi-stage reinforcement learning, enabling it to deliver performance comparable to much larger models while operating under 1GB of memory.

Key Capabilities & Features

  • Optimized for Edge: Achieves fast decode speeds (e.g., 313 tok/s on AMD CPU, 188 tok/s on Snapdragon Gen4) with day-one support for llama.cpp, MLX, and vLLM.
  • Compact yet Powerful: A 350M parameter model with a 32,768 token context length, offering strong performance for its size.
  • Multilingual Support: Supports English, Arabic, Chinese, French, German, Japanese, Korean, Portuguese, and Spanish.
  • Tool Use & Function Calling: Features robust support for function calling, allowing the model to interact with external tools and interpret their outcomes.
  • Broad Inference Support: Available in multiple formats including native, GGUF, ONNX, MLX, and OpenVINO for diverse hardware and deployment scenarios.

Good For

  • Data Extraction: Efficiently extracting specific information from text.
  • Structured Outputs: Generating responses in predefined formats.
  • Tool Use: Applications requiring function calling and interaction with external systems.
  • On-Device & Edge Deployment: Ideal for scenarios where resources are limited, such as mobile or embedded systems, due to its small footprint and optimized inference.