nerkyor/Qwen3.6-35B-A3B-DSV4Pro-Thinking-Distill

TEXT GENERATIONConcurrency Cost:3Model Size:35.1BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Jun 7, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

The nerkyor/Qwen3.6-35B-A3B-DSV4Pro-Thinking-Distill is a 35 billion parameter Mixture-of-Experts (MoE) model with 3 billion active parameters, based on the Qwen3.6 architecture. Developed by nerkyor, it is specifically distilled from DeepSeek-V4-Pro to excel as a fast task orchestrator for Lynn Agent, focusing on reasoning, decomposition, delegation, and verification. This model is optimized for agentic behavior and improved convergence in complex reasoning tasks, demonstrating a 7.6 percentage point increase on GPQA-Diamond-198 and a 2.3x faster end-to-end orchestration time compared to its base model.

Loading preview...

Model Overview

This model, nerkyor/Qwen3.6-35B-A3B-DSV4Pro-Thinking-Distill, is a 35-billion parameter Mixture-of-Experts (MoE) model with 3 billion active parameters, built on the Qwen3.6 architecture. It is specifically designed as a high-end local orchestrator for the Lynn Agent, serving as a sparse counterpart to a 27B dense sister model. The core innovation lies in its distillation process: using LoRA, it learns the reasoning style and agentic behavior of DeepSeek-V4-Pro, particularly its 'thinking-on' approach for task decomposition, delegation, and verification.

Key Capabilities & Differentiators

  • Task Orchestration: Purpose-built for efficient task management within the Lynn Agent, enabling faster decision-making and convergence.
  • Enhanced Reasoning: Achieves a +7.6 percentage point improvement on GPQA-Diamond-198, indicating significantly better performance in hard reasoning tasks.
  • Faster End-to-End Orchestration: Demonstrates a 2.3x speedup in end-to-end orchestration time due to fewer tokens required for decision-making.
  • Reduced Ambiguity: Significantly decreases non-terminating empty answers (from 12 to 1 on GPQA), showcasing improved decisiveness.
  • Native MTP (nextn) Support: Includes a native speculative decoding head for single-stream acceleration, with speedups up to 1.63x for Q8_0 quantization.
  • Distilled Thinking Style: Focuses on learning how to reason and converge rather than injecting new knowledge, making it adept at complex problem-solving workflows.

Limitations

  • Knowledge Ceiling: Distillation focuses on thinking style, not knowledge, leading to a slight dip in MMLU scores (~1.2pp) compared to the base model.
  • Specialized Role: Primarily an orchestrator, not a broad knowledge model. Its strength lies in agentic workflows rather than general knowledge breadth.

Recommended Use Cases

  • Lynn Agent Deployments: Ideal for local orchestration on machines with 32GB+ VRAM/unified memory.
  • Complex Task Management: Suited for applications requiring robust task decomposition, delegation, and verification.
  • Agentic Workflows: Excellent for scenarios where a model needs to reason through steps, call tools, and converge on solutions efficiently.