Name: DuoNeural/Mistral-NeMo-12B-Abliterated API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: DuoNeural

DuoNeural/Mistral-NeMo-12B-Abliterated: A Research Artifact

This model, developed by DuoNeural, is a 12.2 billion parameter language model based on mistralai/Mistral-Nemo-Instruct-2407. It features a dense architecture with 40 layers, 5120 hidden dimensions, and GQA attention (8 KV heads / 32 query heads) with SWA 4096. The model utilizes a Tekken v3 tokenizer with a 131,072-token vocabulary.

Key Research Focus: Abliteration and Safety

The primary characteristic of this model is the application of orthogonal rank-1 projection abliteration. This method, a DuoNeural standard, was applied to the down_proj and o_proj layers across all 40 layers. The abliteration process aimed to modify the model's response to harmful content, using a diff-in-means approach based on 10 harmful vs 10 harmless prompts.

Notable Findings:

Pre-abliteration Compliance: The base Mistral-Nemo-Instruct-2407 model already demonstrated compliance to 6 out of 6 harmful probes before any weight modification. This suggests that Mistral's safety training approach does not install a strong output-gate refusal mechanism.
Minimal Behavioral Shift: The abliteration resulted in a very low KL divergence of 0.0004 (EXCELLENT) on 10 benign probes, indicating a near-zero benign distribution shift. This confirms that while the abliteration was mechanistically clean, its behavioral impact was minimal due to the base model's existing compliance.
P34 Research Context: This model is a component of DuoNeural's P34 Reasoning Channel Bypass study, investigating how different architectures handle safety training and refusal mechanisms. It highlights a contrast with models like Gemma 4-12B-IT and LFM 2.5-8B-A1B, where abliteration produced more significant behavioral dissociation.

Good for:

LLM Safety Research: Ideal for researchers studying model safety, refusal mechanisms, and the effects of abliteration techniques.
Understanding Model Architecture: Provides insights into how different base models (like Mistral-NeMo) respond to targeted weight modifications for safety.
Comparative Analysis: Useful for comparing safety training effectiveness across various LLM architectures.

Overview

DuoNeural/Mistral-NeMo-12B-Abliterated: A Research Artifact

Key Research Focus: Abliteration and Safety

Notable Findings:

Good for:

Full Model Card (README)