nerkyor/Qwen3.6-35B-A3B-DSV4Pro-Thinking-Distill
The nerkyor/Qwen3.6-35B-A3B-DSV4Pro-Thinking-Distill is a 35 billion parameter Mixture-of-Experts (MoE) model with 3 billion active parameters, based on the Qwen3.6 architecture. Developed by nerkyor, it is specifically distilled from DeepSeek-V4-Pro to excel as a fast task orchestrator for Lynn Agent, focusing on reasoning, decomposition, delegation, and verification. This model is optimized for agentic behavior and improved convergence in complex reasoning tasks, demonstrating a 7.6 percentage point increase on GPQA-Diamond-198 and a 2.3x faster end-to-end orchestration time compared to its base model.
Loading preview...
Model Overview
This model, nerkyor/Qwen3.6-35B-A3B-DSV4Pro-Thinking-Distill, is a 35-billion parameter Mixture-of-Experts (MoE) model with 3 billion active parameters, built on the Qwen3.6 architecture. It is specifically designed as a high-end local orchestrator for the Lynn Agent, serving as a sparse counterpart to a 27B dense sister model. The core innovation lies in its distillation process: using LoRA, it learns the reasoning style and agentic behavior of DeepSeek-V4-Pro, particularly its 'thinking-on' approach for task decomposition, delegation, and verification.
Key Capabilities & Differentiators
- Task Orchestration: Purpose-built for efficient task management within the Lynn Agent, enabling faster decision-making and convergence.
- Enhanced Reasoning: Achieves a +7.6 percentage point improvement on GPQA-Diamond-198, indicating significantly better performance in hard reasoning tasks.
- Faster End-to-End Orchestration: Demonstrates a 2.3x speedup in end-to-end orchestration time due to fewer tokens required for decision-making.
- Reduced Ambiguity: Significantly decreases non-terminating empty answers (from 12 to 1 on GPQA), showcasing improved decisiveness.
- Native MTP (nextn) Support: Includes a native speculative decoding head for single-stream acceleration, with speedups up to 1.63x for Q8_0 quantization.
- Distilled Thinking Style: Focuses on learning how to reason and converge rather than injecting new knowledge, making it adept at complex problem-solving workflows.
Limitations
- Knowledge Ceiling: Distillation focuses on thinking style, not knowledge, leading to a slight dip in MMLU scores (~1.2pp) compared to the base model.
- Specialized Role: Primarily an orchestrator, not a broad knowledge model. Its strength lies in agentic workflows rather than general knowledge breadth.
Recommended Use Cases
- Lynn Agent Deployments: Ideal for local orchestration on machines with 32GB+ VRAM/unified memory.
- Complex Task Management: Suited for applications requiring robust task decomposition, delegation, and verification.
- Agentic Workflows: Excellent for scenarios where a model needs to reason through steps, call tools, and converge on solutions efficiently.