unsloth/LFM2.5-1.2B-Instruct
The unsloth/LFM2.5-1.2B-Instruct is a 1.2 billion parameter instruction-tuned model developed by Liquid AI, built on the LFM2 architecture with extended pre-training on 28T tokens and large-scale reinforcement learning. Designed for on-device deployment, it offers fast edge inference and runs under 1GB of memory, supporting a 32,768 token context length. This general-purpose model excels in agentic tasks, data extraction, and RAG, rivaling larger models in performance while optimized for efficient local deployment.
Loading preview...
LFM2.5-1.2B-Instruct: On-Device AI
LFM2.5-1.2B-Instruct, developed by Liquid AI, is a 1.2 billion parameter instruction-tuned model optimized for on-device deployment and fast edge inference. It builds upon the LFM2 architecture, featuring extended pre-training on 28 trillion tokens and multi-stage reinforcement learning to achieve performance comparable to much larger models.
Key Capabilities & Features
- Efficient On-Device Performance: Achieves 239 tok/s decode on AMD CPU and 82 tok/s on mobile NPU, operating under 1GB of memory. It supports
llama.cpp, MLX, and vLLM from day one. - Extended Context Window: Features a substantial 32,768 token context length.
- Multilingual Support: Trained on English, Arabic, Chinese, French, German, Japanese, Korean, and Spanish.
- Tool Use: Supports function calling with a Pythonic format, enabling integration with external tools for complex tasks.
- Hybrid Architecture: Utilizes 10 double-gated LIV convolution blocks and 6 GQA blocks.
Performance & Benchmarks
LFM2.5-1.2B-Instruct demonstrates strong performance against other sub-2B models across various benchmarks, including GPQA, MMLU-Pro, IFEval, and AIME25, often outperforming competitors in its size class. Its inference speed is particularly notable on CPUs and NPUs, unlocking new deployment scenarios for vehicles, mobile devices, and IoT.
Recommended Use Cases
This model is particularly well-suited for:
- Agentic tasks
- Data extraction
- Retrieval-Augmented Generation (RAG)
It is not recommended for knowledge-intensive tasks or programming.