wan-wan/test18-dpo
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Mar 1, 2026License:apache-2.0Architecture:Transformer Open Weights Warm
wan-wan/test18-dpo is a 4 billion parameter Qwen3 model developed by wan-wan, fine-tuned from wan-wan/test08-checkpoint-266. This model was trained using Unsloth and Huggingface's TRL library, achieving a 2x speed improvement during finetuning. With a 32768 token context length, it is optimized for efficient processing of long sequences.
Loading preview...
Model Overview
wan-wan/test18-dpo is a 4 billion parameter Qwen3 model developed by wan-wan. It was fine-tuned from the wan-wan/test08-checkpoint-266 model and utilizes a substantial 32768 token context length, making it suitable for tasks requiring extensive contextual understanding.
Key Capabilities
- Efficient Finetuning: This model was finetuned with Unsloth and Huggingface's TRL library, resulting in a 2x speed improvement during the training process.
- Qwen3 Architecture: Based on the Qwen3 architecture, it inherits its foundational language understanding and generation capabilities.
- Extended Context Window: Features a 32768 token context length, allowing it to process and generate longer texts while maintaining coherence.
Good For
- Applications requiring a Qwen3-based model with a large context window.
- Developers looking for models that have undergone efficient finetuning processes.
- Tasks benefiting from a 4 billion parameter model with optimized training characteristics.