wan-wan/test11-dpo

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Feb 27, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

The wan-wan/test11-dpo is a 4 billion parameter Qwen3-based language model developed by wan-wan, fine-tuned using Unsloth and Huggingface's TRL library. This model features a 32768 token context length and is optimized for efficient training, achieving 2x faster finetuning. It is designed for general language tasks, leveraging its Qwen3 architecture for robust performance.

Loading preview...

Model Overview

wan-wan/test11-dpo is a 4 billion parameter language model, finetuned by wan-wan from a wan-wan/test08-checkpoint-266 base. This model utilizes the Qwen3 architecture and boasts a substantial 32768 token context length, making it suitable for processing longer sequences of text.

Key Differentiators

  • Efficient Finetuning: A notable characteristic of this model is its development process, which leveraged Unsloth and Huggingface's TRL library. This combination enabled the model to be trained 2x faster, highlighting an optimization in the finetuning methodology.
  • Qwen3 Architecture: Built upon the Qwen3 model family, it inherits the foundational capabilities and performance characteristics associated with this architecture.

Potential Use Cases

Given its parameter count and context length, wan-wan/test11-dpo is well-suited for a variety of general-purpose language tasks, particularly where efficient training and a robust base model are beneficial. Its finetuning approach suggests it could be a good candidate for applications requiring custom adaptations with reduced training times.