PinoCookie/LFM2.5-350M-abliterated

TEXT GENERATIONConcurrency Cost:1Model Size:0.35BQuant:BF16Ctx Length:32kPublished:Jun 10, 2026License:otherArchitecture:Transformer Cold

PinoCookie/LFM2.5-350M-abliterated is a 0.35 billion parameter language model with a 32768 token context length, derived from LiquidAI's LFM2.5-350M. This model has been modified using Magnitude-Preserving Orthogonal Ablation (MPOA) to significantly reduce safety-related refusals, achieving a 74.7% compliance rate on HarmBench compared to the original's 12%. It is specifically designed for safety research, red-teaming, and understanding refusal mechanisms in LLMs, allowing for the generation of content that the original model would typically refuse.

Loading preview...

LFM2.5-350M-Abliterated: Reduced Refusals for Safety Research

This model, developed by PinoCookie, is an 'abliterated' version of the LiquidAI/LFM2.5-350M, a 0.35 billion parameter model with a 32768 token context length. Its primary distinction is a drastically reduced refusal rate to harmful prompts, making it a valuable tool for specific research and development purposes.

Key Capabilities & Differentiators

  • Significantly Lower Refusal Rate: Achieves a 74.7% compliance rate on HarmBench, a substantial increase from the original model's ~12%. This represents a 74.7 percentage-point reduction in refusals.
  • Ablation Methodology: Utilizes Magnitude-Preserving Orthogonal Ablation (MPOA) with per-layer float-direction interpolation, specifically targeting attention output projections (self_attn.out_proj) in the model's 6 GQA layers.
  • Hybrid Architecture Consideration: The ablation process carefully avoided the 10 LIV (Liquid Convolution) layers, which were found to be highly sensitive to weight perturbations, preserving factual knowledge.
  • Research Focus: Intended for safety research, red-teaming, and analyzing refusal mechanisms in language models, offering insights into how safety filters operate.

Performance & Limitations

Evaluated on the full HarmBench DirectRequest test set (320 prompts), the abliterated model refused only 25.3% of prompts, down from ~88% in the original. While compliant outputs are structurally coherent, approximately 15-25% may show mild content degradation or off-topic drift due to the shared representation space between refusal direction and general language capability in a 350M-parameter model.

Limitations include:

  • Residual Refusal: Still refuses about 25% of harmful prompts, particularly in categories like chemical synthesis or exploit code.
  • Content Degradation: Some content quality may be affected.
  • Partial Ablation: Only attention layers were modified; convolution layers remain untouched.

Use Cases

  • Safety Research: Investigate and understand how language models generate and refuse harmful content.
  • Red-Teaming: Test the robustness and vulnerabilities of safety mechanisms in LLMs.
  • Model Analysis: Explore the internal workings of refusal signals and their evolution across model layers.