Name: LiquidAI/LFM2.5-1.2B-Thinking API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: LiquidAI

LFM2.5-1.2B-Thinking: On-Device Reasoning

LFM2.5-1.2B-Thinking is a 1.17 billion parameter model from LiquidAI, part of the LFM2.5 family of hybrid models optimized for on-device deployment. It features a 32,768 token context length and was trained on an extended 28 trillion token dataset with large-scale multi-stage reinforcement learning.

Key Capabilities & Performance

Best-in-class performance for its size: Benchmarks show it rivals much larger models, particularly in reasoning tasks like GPQA Diamond, IFEval, Multi-IF, GSM8K, and MATH-500, often outperforming Qwen3-1.7B in thinking mode on several metrics.
Fast Edge Inference: Achieves 239 tok/s decode on AMD CPU and 82 tok/s on mobile NPU, running under 1GB of memory. It supports llama.cpp, MLX, and vLLM from day one.
Efficient Long-Context Handling: Demonstrates robust long-context scalability, sustaining ~46 tok/s at its full 32K context on AMD Ryzen™ NPUs.
Tool Use: Supports function calling with a flexible JSON or Pythonic format for agentic workflows.
Multilingual Support: Trained on English, Arabic, Chinese, French, German, Japanese, Korean, and Spanish.

Good For

Agentic tasks
Data extraction
Retrieval Augmented Generation (RAG)
On-device deployment across mobile, IoT, and embedded systems due to its efficiency and low memory footprint.

It is not recommended for knowledge-intensive tasks or programming.

Overview

LFM2.5-1.2B-Thinking: On-Device Reasoning

Key Capabilities & Performance

Good For

Full Model Card (README)