coder3101/LFM2.5-350M-heretic
The coder3101/LFM2.5-350M-heretic is a 0.35 billion parameter instruction-tuned language model, derived from LiquidAI's LFM2.5-350M using the Heretic tool. This model is optimized for on-device deployment, offering fast edge inference and a 32,768 token context length. It is specifically designed to be a decensored version of its base model, aiming to reduce refusals while maintaining performance. Its primary strengths lie in data extraction, structured outputs, and tool use, making it suitable for resource-constrained environments.
Loading preview...
LFM2.5-350M-heretic: Decensored On-Device LLM
This model is a 0.35 billion parameter instruction-tuned variant of LiquidAI's LFM2.5-350M, created using the Heretic v1.3.0 tool. Its core differentiator is its "decensored" nature, significantly reducing refusals compared to the original model (6/100 vs. 90/100 refusals) while maintaining a low KL divergence of 0.0440.
Key Capabilities & Features
- On-Device Optimization: Designed for efficient deployment on edge devices, offering fast inference speeds (e.g., 313 tok/s on AMD CPU, 188 tok/s on Snapdragon Gen4) and low memory footprint (under 1GB).
- Extended Training: Built on the LFM2 architecture with extended pre-training (28T tokens) and large-scale multi-stage reinforcement learning.
- Tool Use: Supports function calling with a structured approach for defining, calling, and executing tools, including Pythonic and JSON function calls.
- Multilingual Support: Capable in English, Arabic, Chinese, French, German, Japanese, Korean, Portuguese, and Spanish.
- Reproducible: The model's creation process is reproducible, with details available in the
reproducedirectory. - Broad Inference Support: Compatible with various inference frameworks including Transformers, vLLM, llama.cpp, MLX, LM Studio, and OpenVINO.
Performance Highlights
LFM2.5-350M demonstrates competitive performance against other small models (e.g., LFM2-350M, Granite 4.0-350M, Qwen3.5-0.8B, Gemma 3 1B IT) across various benchmarks like GPQA Diamond, IFEval, and specific domain benchmarks such as CaseReportBench and BFCLv3/v4.
Good for
- Data Extraction: Efficiently pulling specific information from text.
- Structured Outputs: Generating responses in predefined formats.
- Tool Use: Integrating with external functions and APIs for enhanced capabilities.
- Edge Deployment: Applications requiring local, low-resource inference on devices.
Not Recommended for
- Knowledge-Intensive Tasks: The model is not optimized for deep knowledge retrieval.
- Programming: It is not recommended for code generation or programming-specific tasks.