daviddavidlu/DAPO-with-prompt-augmentation-step2480
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Feb 5, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

daviddavidlu/DAPO-with-prompt-augmentation-step2480 is a 1.5 billion parameter Qwen2.5-Math model fine-tuned using DAPO with prompt augmentation. Developed by Wenquan Lu, Hai Huang, and Randall Balestriero, this model is specifically optimized for mathematical reasoning tasks. It leverages prompt augmentation to enhance rollout diversity and stability during reinforcement learning training, achieving over 80 on the MATH500 benchmark.

Loading preview...