LiquidAI/LFM2.5-1.2B-Thinking
LiquidAI's LFM2.5-1.2B-Thinking is a 1.17 billion parameter hybrid language model with a 32,768 token context length, designed for efficient on-device deployment. It excels as a general-purpose reasoning model, optimized for agentic tasks, data extraction, and Retrieval Augmented Generation (RAG). This model offers best-in-class performance for its size, rivaling larger models while maintaining fast inference speeds and low memory usage on edge devices.
Loading preview...
LFM2.5-1.2B-Thinking: On-Device Reasoning
LFM2.5-1.2B-Thinking is a 1.17 billion parameter model from LiquidAI, part of the LFM2.5 family of hybrid models optimized for on-device deployment. It features a 32,768 token context length and was trained on an extended 28 trillion token dataset with large-scale multi-stage reinforcement learning.
Key Capabilities & Performance
- Best-in-class performance for its size: Benchmarks show it rivals much larger models, particularly in reasoning tasks like GPQA Diamond, IFEval, Multi-IF, GSM8K, and MATH-500, often outperforming Qwen3-1.7B in thinking mode on several metrics.
- Fast Edge Inference: Achieves 239 tok/s decode on AMD CPU and 82 tok/s on mobile NPU, running under 1GB of memory. It supports llama.cpp, MLX, and vLLM from day one.
- Efficient Long-Context Handling: Demonstrates robust long-context scalability, sustaining ~46 tok/s at its full 32K context on AMD Ryzen™ NPUs.
- Tool Use: Supports function calling with a flexible JSON or Pythonic format for agentic workflows.
- Multilingual Support: Trained on English, Arabic, Chinese, French, German, Japanese, Korean, and Spanish.
Good For
- Agentic tasks
- Data extraction
- Retrieval Augmented Generation (RAG)
- On-device deployment across mobile, IoT, and embedded systems due to its efficiency and low memory footprint.
It is not recommended for knowledge-intensive tasks or programming.