wan-wan/test10-dpo

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Feb 27, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

The wan-wan/test10-dpo is a 4 billion parameter Qwen3 model developed by wan-wan, fine-tuned from wan-wan/test08-checkpoint-266. This model was trained using Unsloth and Huggingface's TRL library, achieving a 2x speed improvement during its finetuning process. It features a 32768 token context length, making it suitable for tasks requiring extensive context processing.

Loading preview...

Model Overview

The wan-wan/test10-dpo is a 4 billion parameter Qwen3 model developed by wan-wan. It has been fine-tuned from the wan-wan/test08-checkpoint-266 base model and operates under an Apache-2.0 license. A notable aspect of its development is the use of Unsloth and Huggingface's TRL library, which facilitated a 2x faster training process.

Key Characteristics

  • Architecture: Qwen3
  • Parameter Count: 4 billion
  • Context Length: 32768 tokens
  • Training Efficiency: Achieved 2x faster finetuning using Unsloth and Huggingface's TRL library.

Potential Use Cases

Given its efficient training and substantial context window, this model is well-suited for applications that benefit from:

  • Processing long documents or conversations.
  • Tasks requiring deep contextual understanding.
  • Scenarios where rapid iteration and deployment of fine-tuned models are beneficial due to its optimized training methodology.