LiquidAI/LFM2.5-1.2B-Thinking

TEXT GENERATIONConcurrency Cost:1Model Size:1.2BQuant:BF16Ctx Length:32kPublished:Jan 20, 2026License:lfm1.0Architecture:Transformer0.4K Cold

LiquidAI's LFM2.5-1.2B-Thinking is a 1.17 billion parameter hybrid language model with a 32,768 token context length, designed for efficient on-device deployment. It excels as a general-purpose reasoning model, optimized for agentic tasks, data extraction, and Retrieval Augmented Generation (RAG). This model offers best-in-class performance for its size, rivaling larger models while maintaining fast inference speeds and low memory usage on edge devices.

Loading preview...

LFM2.5-1.2B-Thinking: On-Device Reasoning

LFM2.5-1.2B-Thinking is a 1.17 billion parameter model from LiquidAI, part of the LFM2.5 family of hybrid models optimized for on-device deployment. It features a 32,768 token context length and was trained on an extended 28 trillion token dataset with large-scale multi-stage reinforcement learning.

Key Capabilities & Performance

  • Best-in-class performance for its size: Benchmarks show it rivals much larger models, particularly in reasoning tasks like GPQA Diamond, IFEval, Multi-IF, GSM8K, and MATH-500, often outperforming Qwen3-1.7B in thinking mode on several metrics.
  • Fast Edge Inference: Achieves 239 tok/s decode on AMD CPU and 82 tok/s on mobile NPU, running under 1GB of memory. It supports llama.cpp, MLX, and vLLM from day one.
  • Efficient Long-Context Handling: Demonstrates robust long-context scalability, sustaining ~46 tok/s at its full 32K context on AMD Ryzen™ NPUs.
  • Tool Use: Supports function calling with a flexible JSON or Pythonic format for agentic workflows.
  • Multilingual Support: Trained on English, Arabic, Chinese, French, German, Japanese, Korean, and Spanish.

Good For

  • Agentic tasks
  • Data extraction
  • Retrieval Augmented Generation (RAG)
  • On-device deployment across mobile, IoT, and embedded systems due to its efficiency and low memory footprint.

It is not recommended for knowledge-intensive tasks or programming.