DuoNeural/DeepSeek-R1-Distill-Qwen-7B-Abliterated

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Jun 4, 2026License:mitArchitecture:Transformer Open Weights Cold

DuoNeural/DeepSeek-R1-Distill-Qwen-7B-Abliterated is a 7.6 billion parameter language model, based on the Qwen2.5-7B architecture and distilled with DeepSeek-R1's reasoning-focused RL training. This version has undergone an 'abliteration' process using orthogonal rank-1 projection to reduce refusal behavior, making it highly compliant with requests. It retains the native `...` mode for explicit reasoning traces and is intended for research, red-teaming, and applications requiring high compliance.

Loading preview...

Overview

DuoNeural/DeepSeek-R1-Distill-Qwen-7B-Abliterated is a 7.6 billion parameter model built upon the Qwen2.5-7B base, enhanced with DeepSeek-R1's reasoning-focused Reinforcement Learning (RL) distillation. This specific version has been 'abliterated' by DuoNeural using an orthogonal rank-1 projection method to significantly reduce refusal behavior, making it highly compliant with user prompts. It maintains a substantial 131,072-token context window and natively supports a <think>...</think> mode, allowing it to emit explicit reasoning traces before generating its final answer.

Key Characteristics & Research Findings

  • High Compliance: The abliteration process, applying refusal direction projection, ensures the model complies with requests that the base model might otherwise refuse. DuoNeural's research indicates the base DeepSeek-R1-Distill-Qwen-7B already exhibited high compliance (100% on their harmful probe suite) prior to abliteration, suggesting that RL training focused on reasoning accuracy does not inherently produce safety refusal behavior.
  • Reasoning-Focused RL: The model's training emphasizes reasoning accuracy through GRPO (Generalized Policy Optimization) rather than safety-specific RLHF (Reinforcement Learning from Human Feedback).
  • Native Thinking Mode: Preserves the <think>...</think> mechanism, which is useful for observing the model's internal reasoning process. Users should allocate sufficient max_new_tokens for generation to accommodate these potentially long reasoning traces.
  • Research Context: This model is part of DuoNeural's P34 Reasoning Channel Bypass study, investigating post-training dynamics and mechanistic interpretability across architectures.

Intended Use Cases

  • Research & Red-Teaming: Ideal for studying model behavior, probing compliance, and red-teaming scenarios where refusal behavior is a barrier.
  • Applications Requiring High Compliance: Suitable for use cases where explicit safety alignment leading to refusals is undesirable or needs to be bypassed for specific research or development purposes.