Model Overview
This model, daviddavidlu/DAPO-with-prompt-augmentation-step2820, is a specific checkpoint from the training of a Qwen2.5-Math-1.5B model. It was developed by daviddavidlu and is based on the research outlined in the paper "Prompt Augmentation Scales up GRPO Training on Mathematical Reasoning". The model's core innovation lies in its use of Prompt Augmentation during DAPO (no dynamic sampling) training.
Key Capabilities
- Mathematical Reasoning: Primarily intended for solving mathematical problems, specifically trained on the MATH Level-3-to-5 Dataset.
- Enhanced RL Training: Utilizes prompt augmentation to generate diverse reasoning traces, which significantly increases rollout diversity and stability during reinforcement learning (RL) training.
Training Methodology
The model was trained using the DAPO method, incorporating prompt augmentation. This approach helps in creating a more robust and effective training process for mathematical reasoning tasks. For more details on the training procedure and the underlying research, refer to the Prompt Augmentation Scales up GRPO Training on Mathematical Reasoning paper.
Important Note
This particular checkpoint (step 2820) is noted as an "Outdated" version. Users seeking better performance are advised to consider more recent checkpoints such as DAPO w/ Prompt Augmentation (step 2720) or DAPO w/ Prompt Augmentation (step 2480).