LFM2.5-350M-heretic: Decensored On-Device LLM

This model is a 0.35 billion parameter instruction-tuned variant of LiquidAI's LFM2.5-350M, created using the Heretic v1.3.0 tool. Its core differentiator is its "decensored" nature, significantly reducing refusals compared to the original model (6/100 vs. 90/100 refusals) while maintaining a low KL divergence of 0.0440.

Key Capabilities & Features

On-Device Optimization: Designed for efficient deployment on edge devices, offering fast inference speeds (e.g., 313 tok/s on AMD CPU, 188 tok/s on Snapdragon Gen4) and low memory footprint (under 1GB).
Extended Training: Built on the LFM2 architecture with extended pre-training (28T tokens) and large-scale multi-stage reinforcement learning.
Tool Use: Supports function calling with a structured approach for defining, calling, and executing tools, including Pythonic and JSON function calls.
Multilingual Support: Capable in English, Arabic, Chinese, French, German, Japanese, Korean, Portuguese, and Spanish.
Reproducible: The model's creation process is reproducible, with details available in the reproduce directory.
Broad Inference Support: Compatible with various inference frameworks including Transformers, vLLM, llama.cpp, MLX, LM Studio, and OpenVINO.

Performance Highlights

LFM2.5-350M demonstrates competitive performance against other small models (e.g., LFM2-350M, Granite 4.0-350M, Qwen3.5-0.8B, Gemma 3 1B IT) across various benchmarks like GPQA Diamond, IFEval, and specific domain benchmarks such as CaseReportBench and BFCLv3/v4.

Good for

Data Extraction: Efficiently pulling specific information from text.
Structured Outputs: Generating responses in predefined formats.
Tool Use: Integrating with external functions and APIs for enhanced capabilities.
Edge Deployment: Applications requiring local, low-resource inference on devices.

Not Recommended for

Knowledge-Intensive Tasks: The model is not optimized for deep knowledge retrieval.
Programming: It is not recommended for code generation or programming-specific tasks.

Overview

LFM2.5-350M-heretic: Decensored On-Device LLM

Key Capabilities & Features

Performance Highlights

Good for

Not Recommended for

Full Model Card (README)