lordx64/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled
lordx64/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled is a 35.1 billion parameter Mixture-of-Experts (MoE) model, based on Qwen3.6-35B-A3B, fine-tuned to emulate the chain-of-thought reasoning style of Anthropic's Claude Opus 4.7. This model activates only 3 billion parameters per token, offering the capacity of a larger model with the inference cost of a smaller one. It is specifically optimized for complex reasoning tasks in STEM, mathematics, and multi-step logic, supporting a 64k token context length for extensive internal thought processes.
Loading preview...
Overview
This model, lordx64/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled, is a fine-tuned variant of the Qwen3.6-35B-A3B Mixture-of-Experts (MoE) base model. Its primary distinction is the distillation of Claude Opus 4.7's advanced reasoning capabilities, specifically its chain-of-thought style, into an open-weights model. It achieves this by training on approximately 8,000 high-quality reasoning traces from Opus 4.7, enabling it to generate explicit <think>…</think> blocks before providing final answers.
Key Capabilities
- Claude-style Reasoning: Emulates the sophisticated reasoning patterns and explicit thought processes of Claude Opus 4.7.
- Efficient Inference: As a sparse MoE, it has 256 experts but activates only about 3 billion parameters per token, providing 35B capacity at a lower inference cost.
- Long Context & Thinking: Supports a 64k token context, routinely generating 5-30k tokens of internal reasoning for complex problems.
- Clean Base for Further Tuning: A LoRA adapter is separately available, allowing for additional fine-tuning or application to other checkpoints.
Good For
- Hard Reasoning Tasks: Excels in graduate-level STEM, competition mathematics (AIME/MATH), code reasoning with explicit walkthroughs, and multi-step logic puzzles.
- Agentic Planning: Useful for scenarios where explicit internal thought processes (
<think>) enhance correctness and reliability.
Limitations
- Reasoning ≠ Knowledge: The model inherits the knowledge base of Qwen3.6-35B-A3B; distillation transfers reasoning style, not new factual knowledge.
- Long Generations: Expect potentially very long outputs due to extensive internal reasoning, requiring careful management of
max_new_tokensand sufficientmax_model_len(≥ 32k) during inference.