wan-wan/test18-dpo

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Mar 1, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

wan-wan/test18-dpo is a 4 billion parameter Qwen3 model developed by wan-wan, fine-tuned from wan-wan/test08-checkpoint-266. This model was trained using Unsloth and Huggingface's TRL library, achieving a 2x speed improvement during finetuning. With a 32768 token context length, it is optimized for efficient processing of long sequences.

Loading preview...

Model Overview

wan-wan/test18-dpo is a 4 billion parameter Qwen3 model developed by wan-wan. It was fine-tuned from the wan-wan/test08-checkpoint-266 model and utilizes a substantial 32768 token context length, making it suitable for tasks requiring extensive contextual understanding.

Key Capabilities

  • Efficient Finetuning: This model was finetuned with Unsloth and Huggingface's TRL library, resulting in a 2x speed improvement during the training process.
  • Qwen3 Architecture: Based on the Qwen3 architecture, it inherits its foundational language understanding and generation capabilities.
  • Extended Context Window: Features a 32768 token context length, allowing it to process and generate longer texts while maintaining coherence.

Good For

  • Applications requiring a Qwen3-based model with a large context window.
  • Developers looking for models that have undergone efficient finetuning processes.
  • Tasks benefiting from a 4 billion parameter model with optimized training characteristics.