daviddavidlu/DAPO-with-prompt-augmentation-step2820
The daviddavidlu/DAPO-with-prompt-augmentation-step2820 model is a Qwen2.5-Math-1.5B checkpoint trained using DAPO (no dynamic sampling) with prompt augmentation on the MATH Level-3-to-5 Dataset. Developed by daviddavidlu, this model is specifically designed for mathematical reasoning tasks. It leverages prompt augmentation to enhance rollout diversity and stability during reinforcement learning training, making it suitable for complex mathematical problem-solving.
Loading preview...
Model Overview
This model, daviddavidlu/DAPO-with-prompt-augmentation-step2820, is a specific checkpoint from the training of a Qwen2.5-Math-1.5B model. It was developed by daviddavidlu and is based on the research outlined in the paper "Prompt Augmentation Scales up GRPO Training on Mathematical Reasoning". The model's core innovation lies in its use of Prompt Augmentation during DAPO (no dynamic sampling) training.
Key Capabilities
- Mathematical Reasoning: Primarily intended for solving mathematical problems, specifically trained on the MATH Level-3-to-5 Dataset.
- Enhanced RL Training: Utilizes prompt augmentation to generate diverse reasoning traces, which significantly increases rollout diversity and stability during reinforcement learning (RL) training.
Training Methodology
The model was trained using the DAPO method, incorporating prompt augmentation. This approach helps in creating a more robust and effective training process for mathematical reasoning tasks. For more details on the training procedure and the underlying research, refer to the Prompt Augmentation Scales up GRPO Training on Mathematical Reasoning paper.
Important Note
This particular checkpoint (step 2820) is noted as an "Outdated" version. Users seeking better performance are advised to consider more recent checkpoints such as DAPO w/ Prompt Augmentation (step 2720) or DAPO w/ Prompt Augmentation (step 2480).