Name: daviddavidlu/DAPO-with-prompt-augmentation-step2480 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: daviddavidlu

Overview

This model, daviddavidlu/DAPO-with-prompt-augmentation-step2480, is a specialized checkpoint of the Qwen2.5-Math-1.5B architecture, developed by Wenquan Lu and his team. It represents step 2480 in a training process that utilizes PrAg-PO (Prompt Augmented Policy Optimization). The core innovation lies in its training methodology, which employs prompt augmentation to generate diverse reasoning traces, enhancing rollout diversity and stability during reinforcement learning.

Key Capabilities

Advanced Mathematical Reasoning: Specifically designed and optimized for solving complex mathematical problems.
Robustness and Diversity: Leverages prompt augmentation to improve the robustness and diversity of its reasoning traces.
Benchmark Performance: Achieves a score of over 80 on the challenging MATH500 dataset, indicating strong performance in mathematical tasks.

Training Methodology

The model's training procedure is detailed in the paper "PrAg-PO: Prompt Augmented Policy Optimization for Robust and Diverse Mathematical Reasoning" (arXiv:2602.03190). This method focuses on increasing the variety and stability of reasoning paths during RL training, which is crucial for tackling intricate mathematical problems.

Good For

Mathematical Problem Solving: Ideal for applications requiring high-accuracy mathematical reasoning.
Research in RL and Prompt Engineering: Provides a practical example of PrAg-PO's effectiveness in enhancing model capabilities for specific domains.

Overview

Overview

Key Capabilities

Training Methodology

Good For

Full Model Card (README)