blackbook-lm/DeepSeek-R1-Distill-Qwen-7B-heretic

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Mar 1, 2026License:mitArchitecture:Transformer0.0K Open Weights Cold

blackbook-lm/DeepSeek-R1-Distill-Qwen-7B-heretic is a 7.6 billion parameter language model, a decensored version of deepseek-ai/DeepSeek-R1-Distill-Qwen-7B, created using the Heretic v1.2.0 tool. This model is a distilled version of the larger DeepSeek-R1 reasoning model, fine-tuned on the Qwen2.5-Math-7B base model with a 32768 token context length. It is specifically designed to reduce refusals and enhance reasoning capabilities by leveraging patterns from larger models, making it suitable for applications requiring less restrictive content generation.

Loading preview...

Model Overview

This model, blackbook-lm/DeepSeek-R1-Distill-Qwen-7B-heretic, is a 7.6 billion parameter language model derived from deepseek-ai/DeepSeek-R1-Distill-Qwen-7B. It has been decensored using the Heretic v1.2.0 tool, significantly reducing content refusals compared to its original counterpart (5/100 vs. 49/100 refusals). The base model is Qwen2.5-Math-7B, and it benefits from reasoning patterns distilled from the larger DeepSeek-R1 model, which was developed using large-scale reinforcement learning (RL) to foster advanced reasoning behaviors like self-verification and reflection.

Key Capabilities

  • Reduced Refusals: Engineered to provide less restrictive outputs, making it suitable for a wider range of applications.
  • Reasoning Enhancement: Incorporates distilled reasoning capabilities from the DeepSeek-R1 model, which excels in complex problem-solving.
  • Extended Context: Supports a context length of 32768 tokens, allowing for processing longer inputs and generating more coherent, extended responses.
  • Mathematical Proficiency: Built upon a math-focused base model, suggesting strong performance in quantitative tasks.

Good For

  • Applications requiring less censorship: Ideal for use cases where the original model's refusal rates are prohibitive.
  • Reasoning-intensive tasks: Benefits from the DeepSeek-R1 distillation, making it suitable for tasks demanding logical thought and problem-solving.
  • Developers seeking a Qwen-based model with enhanced reasoning: Offers a powerful alternative for those already familiar with the Qwen architecture.