Name: daviddavidlu/DAPO-with-prompt-augmentation-step2820 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: daviddavidlu

Model Overview

This model, daviddavidlu/DAPO-with-prompt-augmentation-step2820, is a checkpoint from the PrAg-PO training of a Qwen2.5-Math-1.5B base model. Developed by Wenquan Lu et al., it is specifically designed for advanced mathematical reasoning. The training methodology, detailed in the paper "PrAg-PO: Prompt Augmented Policy Optimization for Robust and Diverse Mathematical Reasoning," focuses on enhancing the model's ability to solve complex math problems.

Key Capabilities

Mathematical Reasoning: Specialized in tackling problems from the MATH Level-3-to-5 Dataset.
Prompt Augmentation: Employs Prompt Augmented Policy Optimization (PrAg-PO) to generate varied reasoning traces.
Robustness and Diversity: The prompt augmentation technique increases rollout diversity and stability during reinforcement learning (RL) training, leading to more robust problem-solving.

When to Use This Model

This model is particularly suitable for:

Solving challenging mathematical problems: Specifically those at the MATH Level-3-to-5 difficulty.
Research in mathematical reasoning: Ideal for exploring the impact of prompt augmentation and policy optimization in LLMs.
Applications requiring robust and diverse reasoning paths: Where a single, fixed reasoning approach might be insufficient.

For potentially better performance, users are encouraged to explore other checkpoints like step 2720 and step 2480.

Overview

Model Overview

Key Capabilities

When to Use This Model

Full Model Card (README)