Name: nerkyor/Qwen3.6-27B-DSV4Pro-Thinking-Distill API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: nerkyor

Overview

nerkyor/Qwen3.6-27B-DSV4Pro-Thinking-Distill is a 27 billion parameter model based on the Qwen3.6-27B Dense architecture. It was fine-tuned using LoRA to distill the reasoning style and agentic behavior of DeepSeek-V4-Pro, specifically focusing on its "thinking-on" capabilities. This distillation process aims to teach the model how to reason and converge rather than injecting new knowledge or increasing its raw capability ceiling.

Key Capabilities & Differentiators

Enhanced Reasoning: Achieves a +7.1 percentage point improvement on GPQA-Diamond-198 (80.81%) and a +13.13 percentage point improvement on GPQA-Diamond-198 (81.82%) under streaming harness, demonstrating superior hard reasoning compared to its base model.
Improved Convergence: Significantly reduces unconverged answers, with GPQA finish=length cases dropping from 12 to 0, indicating the model "learns to converge" and provide complete responses.
Agentic Behavior: Shows improved performance on Agentic SOLO tasks (16/20 vs. 13/20 for the base), reflecting successful distillation of tool-calling and multi-step reasoning.
Multi-Token Prediction (MTP): Integrates a native MTP head, providing a 2.3-2.6x single-stream inference speedup across various quantization tiers (e.g., 26.8 TPS for Q4_K_M).
Knowledge Retention: Maintains or slightly improves MMLU scores (+0.2pp to +0.4pp), indicating that reasoning improvements do not come at the cost of general knowledge.
Robust Coding: Performance on coding-100 tasks is maintained or slightly improved (86/100 vs. 83/100).

Limitations

Distills Thinking Style, Not Capability: The model learns how to reason and converge, but black-box SFT does not inherently raise its knowledge ceiling.
Simulated Tool Execution: Tool execution results during training were simulated by a smaller model, not run in a real sandbox. This is an engineering trade-off for cost and speed, but it carries a "sim-to-real gap" risk, potentially leading to the model fabricating tool return values. Future versions plan to use real sandbox execution and rejection sampling to mitigate this.

Good For

Edge and Desktop Reasoning: Recommended as a local reasoning model for applications like Lynn Agent, especially the GGUF versions.
Complex Problem Solving: Ideal for tasks requiring multi-step reasoning, logical deduction, and agentic planning.
Applications Requiring Fast Inference: Benefits from the native MTP head for accelerated single-stream generation.

Overview

Overview

Key Capabilities & Differentiators

Limitations

Good For

Full Model Card (README)