lordx64/Qwen3.6-35B-A3B-Kimi-K2.6-Reasoning-Distilled

TEXT GENERATIONConcurrency Cost:3Model Size:35.1BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Apr 26, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

lordx64/Qwen3.6-35B-A3B-Kimi-K2.6-Reasoning-Distilled is a 35.1 billion parameter Mixture-of-Experts (MoE) model, based on the Qwen3.6-35B-A3B architecture, fine-tuned to emulate the verbose, deliberate chain-of-thought reasoning style of Moonshot AI's Kimi K2.6. This model is optimized for complex reasoning tasks such as graduate-level STEM, competition math, and multi-step logic puzzles, leveraging sparse activation for efficient inference. It features a 32K context length and explicitly generates detailed blocks before providing final answers.

Loading preview...

Overview

This model, Qwen3.6-35B-A3B-Kimi-K2.6-Reasoning-Distilled, is a 35.1 billion parameter Mixture-of-Experts (MoE) variant of the Qwen3.6-35B-A3B base model. It has been fine-tuned to imitate the verbose, deliberate chain-of-thought reasoning style of Moonshot AI's Kimi K2.6, a frontier reasoning model. The goal is to port Kimi-grade reasoning behavior into a permissively-licensed MoE model that can be run by individuals.

Key Capabilities

  • Kimi-style Reasoning: Fine-tuned on ~7.8k high-quality reasoning traces from Kimi K2.6, teaching the model to explicitly "think" using <think>…</think> blocks.
  • Verbose Reasoning Chains: Inherits Kimi K2.6's tendency to produce significantly longer and more careful reasoning chains compared to other models, averaging ~3.4x longer than Claude Opus 4.7 in observed datasets.
  • Efficient MoE Architecture: The base model is a 35B-parameter MoE with 256 experts, routing 8 experts plus 1 shared, resulting in only ~3B active parameters per token for efficient inference.
  • Extended Context: Supports a 64k token context, allowing for long reasoning processes (5-30k tokens of <think> output) on challenging problems.
  • Companion Model: Designed to be directly comparable with its Claude-distilled sibling, offering a choice between Kimi's longer, deliberate reasoning and Claude's shorter, tighter chains.

Good For

  • Hard Reasoning Tasks: Excels in graduate-level STEM, competition math (AIME/MATH), code reasoning with explicit walk-throughs, and multi-step logic puzzles.
  • Agentic Planning: Useful for scenarios where explicit <think> blocks enhance correctness and transparency.
  • Predictable Reasoning Output: Provides reliable <think>-block reasoning regardless of prompt pattern, which can be beneficial when the base model's thinking mode is conditional.