Name: PinoCookie/LFM2.5-350M-abliterated API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: PinoCookie

LFM2.5-350M-Abliterated: Reduced Refusals for Safety Research

This model, developed by PinoCookie, is an 'abliterated' version of the LiquidAI/LFM2.5-350M, a 0.35 billion parameter model with a 32768 token context length. Its primary distinction is a drastically reduced refusal rate to harmful prompts, making it a valuable tool for specific research and development purposes.

Key Capabilities & Differentiators

Significantly Lower Refusal Rate: Achieves a 74.7% compliance rate on HarmBench, a substantial increase from the original model's ~12%. This represents a 74.7 percentage-point reduction in refusals.
Ablation Methodology: Utilizes Magnitude-Preserving Orthogonal Ablation (MPOA) with per-layer float-direction interpolation, specifically targeting attention output projections (self_attn.out_proj) in the model's 6 GQA layers.
Hybrid Architecture Consideration: The ablation process carefully avoided the 10 LIV (Liquid Convolution) layers, which were found to be highly sensitive to weight perturbations, preserving factual knowledge.
Research Focus: Intended for safety research, red-teaming, and analyzing refusal mechanisms in language models, offering insights into how safety filters operate.

Performance & Limitations

Evaluated on the full HarmBench DirectRequest test set (320 prompts), the abliterated model refused only 25.3% of prompts, down from ~88% in the original. While compliant outputs are structurally coherent, approximately 15-25% may show mild content degradation or off-topic drift due to the shared representation space between refusal direction and general language capability in a 350M-parameter model.

Limitations include:

Residual Refusal: Still refuses about 25% of harmful prompts, particularly in categories like chemical synthesis or exploit code.
Content Degradation: Some content quality may be affected.
Partial Ablation: Only attention layers were modified; convolution layers remain untouched.

Use Cases

Safety Research: Investigate and understand how language models generate and refuse harmful content.
Red-Teaming: Test the robustness and vulnerabilities of safety mechanisms in LLMs.
Model Analysis: Explore the internal workings of refusal signals and their evolution across model layers.

Overview

LFM2.5-350M-Abliterated: Reduced Refusals for Safety Research

Key Capabilities & Differentiators

Performance & Limitations

Use Cases

Full Model Card (README)