wan-wan/test15-dpo

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Feb 28, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

The wan-wan/test15-dpo is a 4 billion parameter Qwen3 model developed by wan-wan, fine-tuned using Unsloth and Huggingface's TRL library. This model was trained with a 32768 token context length and is optimized for efficient training, achieving 2x faster finetuning. It is suitable for tasks requiring a performant yet efficiently trained language model.

Loading preview...

Model Overview

wan-wan/test15-dpo is a 4 billion parameter Qwen3 model developed by wan-wan. It was finetuned from wan-wan/test08-checkpoint-266 and utilizes a substantial 32768 token context length, making it capable of processing extensive inputs.

Key Capabilities

  • Efficient Training: This model was finetuned 2x faster using Unsloth and Huggingface's TRL library, highlighting its optimized training methodology.
  • Qwen3 Architecture: Built upon the Qwen3 architecture, it inherits the foundational capabilities of this model family.
  • Extended Context Window: With a 32768 token context length, it can handle complex and lengthy prompts, maintaining coherence over extended interactions.

Good For

  • Applications requiring efficient finetuning: Developers looking for a model that can be quickly adapted to specific tasks.
  • Tasks benefiting from a large context window: Use cases where understanding long-form text or maintaining conversational history is crucial.
  • General language generation: Suitable for a variety of natural language processing tasks due to its base Qwen3 architecture.