coder3101/LFM2.5-350M-heretic

TEXT GENERATIONConcurrency Cost:1Model Size:0.35BQuant:BF16Ctx Length:32kPublished:Jun 2, 2026License:lfm1.0Architecture:Transformer Cold

The coder3101/LFM2.5-350M-heretic is a 0.35 billion parameter instruction-tuned language model, derived from LiquidAI's LFM2.5-350M using the Heretic tool. This model is optimized for on-device deployment, offering fast edge inference and a 32,768 token context length. It is specifically designed to be a decensored version of its base model, aiming to reduce refusals while maintaining performance. Its primary strengths lie in data extraction, structured outputs, and tool use, making it suitable for resource-constrained environments.

Loading preview...

LFM2.5-350M-heretic: Decensored On-Device LLM

This model is a 0.35 billion parameter instruction-tuned variant of LiquidAI's LFM2.5-350M, created using the Heretic v1.3.0 tool. Its core differentiator is its "decensored" nature, significantly reducing refusals compared to the original model (6/100 vs. 90/100 refusals) while maintaining a low KL divergence of 0.0440.

Key Capabilities & Features

  • On-Device Optimization: Designed for efficient deployment on edge devices, offering fast inference speeds (e.g., 313 tok/s on AMD CPU, 188 tok/s on Snapdragon Gen4) and low memory footprint (under 1GB).
  • Extended Training: Built on the LFM2 architecture with extended pre-training (28T tokens) and large-scale multi-stage reinforcement learning.
  • Tool Use: Supports function calling with a structured approach for defining, calling, and executing tools, including Pythonic and JSON function calls.
  • Multilingual Support: Capable in English, Arabic, Chinese, French, German, Japanese, Korean, Portuguese, and Spanish.
  • Reproducible: The model's creation process is reproducible, with details available in the reproduce directory.
  • Broad Inference Support: Compatible with various inference frameworks including Transformers, vLLM, llama.cpp, MLX, LM Studio, and OpenVINO.

Performance Highlights

LFM2.5-350M demonstrates competitive performance against other small models (e.g., LFM2-350M, Granite 4.0-350M, Qwen3.5-0.8B, Gemma 3 1B IT) across various benchmarks like GPQA Diamond, IFEval, and specific domain benchmarks such as CaseReportBench and BFCLv3/v4.

Good for

  • Data Extraction: Efficiently pulling specific information from text.
  • Structured Outputs: Generating responses in predefined formats.
  • Tool Use: Integrating with external functions and APIs for enhanced capabilities.
  • Edge Deployment: Applications requiring local, low-resource inference on devices.

Not Recommended for

  • Knowledge-Intensive Tasks: The model is not optimized for deep knowledge retrieval.
  • Programming: It is not recommended for code generation or programming-specific tasks.