Overview

Qwen3-1.7B-Thinking-Distil is a 1.7 billion parameter model from Convergent Intelligence LLC: Research Division, designed to distill the extended reasoning capabilities of the larger Qwen3-30B-A3B-Thinking teacher model. It captures the teacher's deliberative patterns, including reasoning through uncertainty, backtracking, and re-evaluation, into a smaller, more efficient student model.

Key Capabilities

Extended Reasoning: Specializes in generating long-form reasoning chains and internal monologues, mimicking the thought process of a larger model.
Deliberative Depth: Captures the nuanced signal of a "Thinking" teacher, focusing on how a model approaches, reconsiders, and resolves complex problems.
Efficient Distillation: Achieves these advanced reasoning capabilities in a 1.7B parameter model, making it highly efficient for deployment.
High Context Length: Supports a maximum context length of 40,960 tokens, allowing for extensive input and output.

Training Details

The model was trained via Supervised Fine-Tuning (SFT) using the longwriter-6k dataset, which consists of long-form generation samples preserving extended reasoning chains. This direct SFT approach, rather than logit-level Knowledge Distillation (KD), was chosen to effectively transfer the structural signal of the teacher's reasoning process.

Good For

Applications requiring models to "think aloud" or show their reasoning steps.
Tasks benefiting from extended deliberation before providing a final answer.
Scenarios where a smaller model with advanced reasoning capabilities is preferred over larger, more resource-intensive alternatives.
Generating detailed explanations, problem-solving narratives, or complex analytical responses.

Overview

Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)