nerkyor/Qwen3.6-27B-DSV4Pro-Thinking-Distill
nerkyor/Qwen3.6-27B-DSV4Pro-Thinking-Distill is a 27 billion parameter Qwen3.6-27B (Dense) model fine-tuned by nerkyor using LoRA to distill the reasoning and agentic behavior of DeepSeek-V4-Pro. This model excels at complex reasoning tasks, showing significant improvements in GPQA (+7.1pp to +13.13pp) and agentic task performance, while maintaining or slightly improving MMLU scores. It is optimized for local reasoning and agentic workflows, particularly as an edge runtime for Lynn Agent, and features native Multi-Token Prediction (MTP) for 2.3-2.6x single-stream inference acceleration.
Loading preview...
Overview
nerkyor/Qwen3.6-27B-DSV4Pro-Thinking-Distill is a 27 billion parameter model based on Qwen3.6-27B (Dense architecture). It was fine-tuned using LoRA to distill the reasoning style and agentic behavior of DeepSeek-V4-Pro, specifically its "thinking-on" capabilities and ReAct-style tool usage. The distillation process focused on teaching the model how to reason and converge, rather than injecting new knowledge, resulting in improved problem-solving without sacrificing general knowledge.
Key Capabilities
- Enhanced Reasoning: Achieves significant improvements on hard reasoning benchmarks like GPQA-Diamond-198 (up to +13.13pp), demonstrating a cleaner "pure gain" compared to its base model.
- Improved Convergence: Drastically reduces instances of unconverged answers, with the distilled model converging on 100% of GPQA questions compared to the base model's 94% (Q5_K_M streaming harness).
- Agentic Behavior: Shows better performance on complex agentic tasks (e.g., Agentic SOLO +3 tasks).
- Coding Proficiency: Maintains or slightly improves coding ability (coding-100 +3).
- Accelerated Inference: Integrates a native Multi-Token Prediction (MTP) head, providing 2.3-2.6x single-stream inference speedup with lossless greedy speculative decoding.
Good For
- Local Reasoning Agents: Recommended as the local reasoning model source for Lynn Agent, particularly for desktop and edge deployments.
- Complex Problem Solving: Ideal for applications requiring robust, multi-step reasoning and problem convergence.
- Agentic Workflows: Suitable for tasks involving tool use and autonomous task execution, where the model needs to "think" and "act" iteratively.
- Performance-Critical Edge Deployments: The native MTP acceleration makes it efficient for single-stream inference on edge devices, with various GGUF quantizations available for different hardware constraints.