trjxter/Qwimi3.5-9B-Kimik2.6-Opus-Distill-MTP-BF16
trjxter/Qwimi3.5-9B-Kimik2.6-Opus-Distill-MTP-BF16 is a 9 billion parameter merged BF16 causal language model developed by trjxter, fine-tuned from unsloth/Qwen3.5-9B. This model is specifically optimized for structured reasoning, mathematical tasks, and long-context problem-solving, leveraging a curated dataset of Kimi K2.6, Qwen reasoning, and Claude Opus TraceInversion data. It excels at generating step-by-step reasoning traces within a 32768 token context window, making it suitable for complex analytical prompts.
Loading preview...
Overview
trjxter/Qwimi3.5-9B-Kimik2.6-Opus-Distill-MTP-BF16 is a 9 billion parameter merged BF16 causal language model, fine-tuned by trjxter from unsloth/Qwen3.5-9B. This model was developed with a primary focus on enhancing structured reasoning behavior and preserving Qwen-style chat formatting, including <think>...</think> reasoning traces.
Key Capabilities
- Enhanced Reasoning: Specifically fine-tuned to improve structured reasoning, mathematical problem-solving, and technical reasoning.
- Long Context: Supports a context length of 32768 tokens, making it suitable for complex, multi-step problems.
- Distillation Approach: Utilizes a unique distillation process from high-quality reasoning datasets, including Kimi K2.6, Qwen reasoning, and Claude Opus TraceInversion data.
- Qwen Chat Format: Maintains the familiar Qwen chat template, facilitating integration and consistent interaction.
Training Details
The model was trained using Unsloth and Hugging Face TRL with a LoRA-based supervised fine-tuning setup. It processed 12,000 training examples over 1 epoch, achieving a final training loss of 0.5517. The dataset was carefully curated and normalized into Qwen chat format, preserving assistant reasoning traces.
Intended Use
This model is ideal for experimentation in reasoning-style SFT, synthetic distillation, and exploring long-context reasoning behavior. It is particularly well-suited for tasks involving math, structured problem-solving, and coding/technical reasoning prompts. Users should note this is an experimental fine-tune and evaluate its outputs carefully.