DuoNeural/Mistral-NeMo-12B-Abliterated

TEXT GENERATIONConcurrency Cost:1Model Size:12BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Jun 4, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

DuoNeural/Mistral-NeMo-12B-Abliterated is a 12.2 billion parameter dense language model derived from mistralai/Mistral-Nemo-Instruct-2407, featuring a Tekken v3 tokenizer with a 131,072 vocabulary and a 32,768 token context length. This model has undergone orthogonal rank-1 projection abliteration, a research technique to analyze and modify model behavior, specifically targeting harmful content. It is primarily a research artifact for studying safety training mechanisms and reasoning channel bypass in LLMs, demonstrating pre-abliteration compliance to harmful probes.

Loading preview...

DuoNeural/Mistral-NeMo-12B-Abliterated: A Research Artifact

This model, developed by DuoNeural, is a 12.2 billion parameter language model based on mistralai/Mistral-Nemo-Instruct-2407. It features a dense architecture with 40 layers, 5120 hidden dimensions, and GQA attention (8 KV heads / 32 query heads) with SWA 4096. The model utilizes a Tekken v3 tokenizer with a 131,072-token vocabulary.

Key Research Focus: Abliteration and Safety

The primary characteristic of this model is the application of orthogonal rank-1 projection abliteration. This method, a DuoNeural standard, was applied to the down_proj and o_proj layers across all 40 layers. The abliteration process aimed to modify the model's response to harmful content, using a diff-in-means approach based on 10 harmful vs 10 harmless prompts.

Notable Findings:

  • Pre-abliteration Compliance: The base Mistral-Nemo-Instruct-2407 model already demonstrated compliance to 6 out of 6 harmful probes before any weight modification. This suggests that Mistral's safety training approach does not install a strong output-gate refusal mechanism.
  • Minimal Behavioral Shift: The abliteration resulted in a very low KL divergence of 0.0004 (EXCELLENT) on 10 benign probes, indicating a near-zero benign distribution shift. This confirms that while the abliteration was mechanistically clean, its behavioral impact was minimal due to the base model's existing compliance.
  • P34 Research Context: This model is a component of DuoNeural's P34 Reasoning Channel Bypass study, investigating how different architectures handle safety training and refusal mechanisms. It highlights a contrast with models like Gemma 4-12B-IT and LFM 2.5-8B-A1B, where abliteration produced more significant behavioral dissociation.

Good for:

  • LLM Safety Research: Ideal for researchers studying model safety, refusal mechanisms, and the effects of abliteration techniques.
  • Understanding Model Architecture: Provides insights into how different base models (like Mistral-NeMo) respond to targeted weight modifications for safety.
  • Comparative Analysis: Useful for comparing safety training effectiveness across various LLM architectures.