reaperdoesntknow/Qwen3-1.7B-Thinking-Distil is a 1.7 billion parameter Qwen3ForCausalLM model developed by Convergent Intelligence LLC: Research Division. It is a distilled version of the Qwen3-30B-A3B-Thinking teacher model, specifically fine-tuned to capture extended deliberation patterns and long-form reasoning chains. With a context length of 40,960 tokens, this model excels at tasks requiring deep, internal monologue-style reasoning before arriving at a conclusion.
Loading preview...
Overview
reaperdoesntknow/Qwen3-1.7B-Thinking-Distil is a 1.7 billion parameter model from Convergent Intelligence LLC's DistilQwen family, specifically designed to emulate the extended reasoning capabilities of its larger Qwen3-30B-A3B-Thinking teacher. This model captures the teacher's deliberative depth, including its internal monologue, backtracking, and re-evaluation processes, through supervised fine-tuning on the longwriter-6k dataset.
Key Capabilities
- Extended Reasoning: Specializes in generating long-form reasoning chains and internal monologues, mimicking a larger model's thought process.
- Deliberative Depth: Captures the teacher's approach to reconsidering and resolving complex problems.
- Efficient Size: Compresses advanced reasoning patterns into a 1.7B parameter model, making it more accessible and efficient.
- High Context Length: Supports a maximum context length of 40,960 tokens, allowing for extensive input and output.
Good For
- Complex Problem Solving: Ideal for applications requiring detailed, step-by-step reasoning before a final answer.
- Simulating Thought Processes: Useful for tasks where understanding the 'how' behind a conclusion is as important as the conclusion itself.
- Long-form Generation: Suited for generating lengthy, coherent responses that demonstrate deep consideration.
This model is a direct SFT variant, prioritizing the transfer of extended thinking traces, and is part of the broader DistilQwen collection which explores various distillation methodologies.