TeichAI/Qwen3-8B-Kimi-K2-Thinking-Distill

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Nov 12, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

TeichAI/Qwen3-8B-Kimi-K2-Thinking-Distill is an 8 billion parameter Qwen3-based language model developed by TeichAI. It is fine-tuned from unsloth/Qwen3-8B-unsloth-bnb-4bit and trained on 1000 examples from MoonshotAI's Kimi k2 thinking dataset, optimized for specific reasoning patterns. This model leverages Unsloth and Huggingface's TRL library for faster training, offering a 32768 token context length.

Loading preview...

Model Overview

TeichAI/Qwen3-8B-Kimi-K2-Thinking-Distill is an 8 billion parameter language model built upon the Qwen3 architecture. Developed by TeichAI, this model is a fine-tuned version of unsloth/Qwen3-8B-unsloth-bnb-4bit.

Key Characteristics

  • Training Data: The model was specifically trained on 1000 examples derived from MoonshotAI's Kimi k2 thinking dataset, suggesting an optimization for particular reasoning or thought processes.
  • Efficient Training: It utilizes Unsloth and Huggingface's TRL library, enabling a reported 2x faster training process.
  • Context Length: Supports a context window of 32768 tokens.
  • License: Distributed under the Apache-2.0 license.

Potential Use Cases

Given its specialized training on 'Kimi k2 thinking' examples, this model is likely well-suited for:

  • Tasks requiring specific reasoning or problem-solving approaches similar to those found in the Kimi k2 thinking dataset.
  • Applications where efficient inference from a Qwen3-8B base is desired.
  • Scenarios benefiting from a model trained with Unsloth's speed optimizations.