unsloth/LFM2-700M

TEXT GENERATIONConcurrency Cost:1Model Size:0.7BQuant:BF16Ctx Length:32kPublished:Jul 11, 2025License:lfm1.0Architecture:Transformer0.0K Cold

LFM2-700M is a 0.7 billion parameter hybrid language model developed by Liquid AI, designed for edge AI and on-device deployment with a 32,768 token context length. It features a new architecture combining multiplicative gates and short convolutions, achieving faster training and inference speeds. This model excels in performance across multiple benchmark categories for its size, making it suitable for fine-tuning on narrow use cases like agentic tasks, data extraction, RAG, creative writing, and multi-turn conversations.

Loading preview...

LFM2-700M: A Hybrid Model for Edge AI

LFM2-700M is a 0.7 billion parameter model from Liquid AI's new generation of hybrid models, specifically engineered for efficient edge AI and on-device deployment. It offers a 32,768 token context length and is built on a novel architecture featuring multiplicative gates and short convolutions (10 conv + 6 attn layers).

Key Capabilities & Features

  • Optimized Performance: Achieves 3x faster training and 2x faster decode/prefill speeds on CPU compared to previous generations and Qwen3, respectively.
  • Benchmark Outperformance: Surpasses similarly-sized models in knowledge, mathematics, instruction following, and multilingual benchmarks.
  • Flexible Deployment: Designed for efficient operation across CPU, GPU, and NPU hardware.
  • Multilingual Support: Supports English, Arabic, Chinese, French, German, Japanese, Korean, and Spanish.
  • Tool Use: Incorporates a structured tool-use mechanism with JSON function definitions and Pythonic function calls.
  • Training: Trained on 10 trillion tokens using knowledge distillation from LFM1-7B, large-scale SFT, custom DPO, and iterative model merging.

Recommended Use Cases

LFM2-700M is particularly suited for fine-tuning on narrow applications to maximize performance. It is recommended for:

  • Agentic tasks
  • Data extraction
  • Retrieval Augmented Generation (RAG)
  • Creative writing
  • Multi-turn conversations

However, it is not recommended for knowledge-intensive tasks or those requiring strong programming skills.