Name: daviddavidlu/DAPO-with-prompt-augmentation-step2720 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: daviddavidlu

Overview

This model, daviddavidlu/DAPO-with-prompt-augmentation-step2720, is a specific checkpoint (step 2720) of the Qwen2.5-Math-1.5B model. It was developed by Wenquan Lu and his team as part of the PrAg-PO (Prompt Augmented Policy Optimization) project. The core innovation lies in its training methodology, which involves prompt augmentation to generate diverse reasoning traces, thereby improving rollout diversity and stability during reinforcement learning.

Key Capabilities

Mathematical Reasoning: Specifically trained and optimized for solving mathematical problems, particularly from the MATH Level-3-to-5 Dataset.
Robustness and Diversity: Leverages prompt augmentation to create varied reasoning paths, enhancing the model's ability to handle diverse problem structures and improve solution robustness.
Reinforcement Learning Integration: Utilizes a policy optimization approach with augmented prompts to refine its mathematical problem-solving strategies.

Good for

Researchers and developers focused on advanced mathematical reasoning tasks.
Applications requiring robust and diverse problem-solving approaches in mathematics.
Exploring the impact of prompt augmentation in reinforcement learning for language models.

For more details, refer to the PrAg-PO GitHub repository and the associated research paper: PrAg-PO: Prompt Augmented Policy Optimization for Robust and Diverse Mathematical Reasoning.

Overview

Overview

Key Capabilities

Good for

Full Model Card (README)