reaperdoesntknow/Qwen3-1.7B-Thinking-Distil
Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Mar 27, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

reaperdoesntknow/Qwen3-1.7B-Thinking-Distil is a 1.7 billion parameter Qwen3ForCausalLM model developed by Convergent Intelligence LLC: Research Division. It is a distilled version of the Qwen3-30B-A3B-Thinking teacher model, specifically fine-tuned to capture extended deliberation patterns and long-form reasoning chains. With a context length of 40,960 tokens, this model excels at tasks requiring deep, internal monologue-style reasoning before arriving at a conclusion.

Loading preview...

Overview

reaperdoesntknow/Qwen3-1.7B-Thinking-Distil is a 1.7 billion parameter model from Convergent Intelligence LLC's DistilQwen family, specifically designed to emulate the extended reasoning capabilities of its larger Qwen3-30B-A3B-Thinking teacher. This model captures the teacher's deliberative depth, including its internal monologue, backtracking, and re-evaluation processes, through supervised fine-tuning on the longwriter-6k dataset.

Key Capabilities

  • Extended Reasoning: Specializes in generating long-form reasoning chains and internal monologues, mimicking a larger model's thought process.
  • Deliberative Depth: Captures the teacher's approach to reconsidering and resolving complex problems.
  • Efficient Size: Compresses advanced reasoning patterns into a 1.7B parameter model, making it more accessible and efficient.
  • High Context Length: Supports a maximum context length of 40,960 tokens, allowing for extensive input and output.

Good For

  • Complex Problem Solving: Ideal for applications requiring detailed, step-by-step reasoning before a final answer.
  • Simulating Thought Processes: Useful for tasks where understanding the 'how' behind a conclusion is as important as the conclusion itself.
  • Long-form Generation: Suited for generating lengthy, coherent responses that demonstrate deep consideration.

This model is a direct SFT variant, prioritizing the transfer of extended thinking traces, and is part of the broader DistilQwen collection which explores various distillation methodologies.