reaperdoesntknow/LFM2.5-1.2B-Distilled-SFT

TEXT GENERATIONConcurrency Cost:1Model Size:1.2BQuant:BF16Ctx Length:32kPublished:Mar 25, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

LFM2.5-1.2B-Distilled-SFT by Convergent Intelligence LLC is a 1.2 billion parameter hybrid SSM + attention model. It was developed through knowledge distillation from a 24B MoE hybrid teacher on STEM chain-of-thought data, followed by supervised fine-tuning on logical inference. This model is optimized for structured STEM reasoning and formal logical inference, offering high inference speeds (239 tok/s on AMD CPU) and low memory footprint (under 1GB RAM) for on-device and edge deployments.

Loading preview...

Model Overview

LFM2.5-1.2B-Distilled-SFT is a 1.2 billion parameter hybrid model, combining State Space Model (SSM) and attention mechanisms. Developed by Convergent Intelligence LLC, it represents a novel two-stage training pipeline: knowledge distillation from a 24B MoE hybrid teacher (LFM2-24B-A2B) using STEM chain-of-thought data, followed by supervised fine-tuning (SFT) on logical inference tasks. This model is notable for being the first proof-weighted distillation + SFT pipeline on a non-transformer architecture.

Key Capabilities

  • Efficient Inference: Achieves 239 tokens/second on AMD CPUs and fits within 1GB of RAM, making it suitable for mobile, edge, and IoT deployments.
  • Structured STEM Reasoning: Distilled from a larger teacher on 2,802 STEM CoT samples across linear algebra, differential equations, electromagnetism, mathematics, and classical mechanics.
  • Formal Logical Inference: Fine-tuned on the KK04/LogicInference_OA dataset, leveraging its hybrid architecture's SSM components which excel at sequential state propagation, aligning naturally with propositional logic chains.
  • Hybrid Architecture: Combines SSM and attention, offering a unique inductive bias for sequential reasoning tasks.

Good For

  • On-device logical inference and STEM reasoning.
  • Mobile, edge, and IoT deployment scenarios requiring efficient, low-resource models.
  • Formal reasoning tasks and educational tutoring applications.
  • Embedded inference pipelines where structured reasoning is needed under 1GB of RAM.