TeichAI/Qwen3-8B-Kimi-K2-Thinking-Distill
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Nov 12, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Warm
TeichAI/Qwen3-8B-Kimi-K2-Thinking-Distill is an 8 billion parameter Qwen3-based language model developed by TeichAI. It is fine-tuned from unsloth/Qwen3-8B-unsloth-bnb-4bit and trained on 1000 examples from MoonshotAI's Kimi k2 thinking dataset, optimized for specific reasoning patterns. This model leverages Unsloth and Huggingface's TRL library for faster training, offering a 32768 token context length.
Loading preview...
Model Overview
TeichAI/Qwen3-8B-Kimi-K2-Thinking-Distill is an 8 billion parameter language model built upon the Qwen3 architecture. Developed by TeichAI, this model is a fine-tuned version of unsloth/Qwen3-8B-unsloth-bnb-4bit.
Key Characteristics
- Training Data: The model was specifically trained on 1000 examples derived from MoonshotAI's Kimi k2 thinking dataset, suggesting an optimization for particular reasoning or thought processes.
- Efficient Training: It utilizes Unsloth and Huggingface's TRL library, enabling a reported 2x faster training process.
- Context Length: Supports a context window of 32768 tokens.
- License: Distributed under the Apache-2.0 license.
Potential Use Cases
Given its specialized training on 'Kimi k2 thinking' examples, this model is likely well-suited for:
- Tasks requiring specific reasoning or problem-solving approaches similar to those found in the Kimi k2 thinking dataset.
- Applications where efficient inference from a Qwen3-8B base is desired.
- Scenarios benefiting from a model trained with Unsloth's speed optimizations.