Name: daviddavidlu/PrAg-PO-Qwen3-1.7b-step720 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: daviddavidlu

Model Overview

The daviddavidlu/PrAg-PO-Qwen3-1.7b-step720 is a 1.7 billion parameter model based on the Qwen3 architecture, representing a specific checkpoint (step 720) from its training process. This model is a product of research detailed in the paper "PrAg-PO: Prompt Augmented Policy Optimization for Robust and Diverse Mathematical Reasoning" by Wenquan Lu, Hai Huang, Enqi Liu, and Randall Balestriero.

Key Capabilities

Mathematical Reasoning: The model is specifically trained and intended for advanced mathematical reasoning tasks, particularly on problems from the MATH Level-3-to-5 Dataset.
Prompt Augmented Policy Optimization (PrAg-PO): It incorporates a novel training procedure that leverages prompt augmentation to generate diverse reasoning traces. This technique aims to increase rollout diversity and stability during reinforcement learning (RL) training.
Robust Problem Solving: By using diverse templates for reasoning traces, the model is designed to achieve more robust and varied approaches to solving mathematical problems.

Good For

Research in Mathematical AI: Ideal for researchers exploring advanced techniques in mathematical reasoning and reinforcement learning for language models.
Benchmarking Mathematical Performance: Can be used to evaluate and compare the performance of LLMs on complex mathematical datasets like MATH Level-3-to-5.
Developing Robust Reasoning Systems: Suitable for applications requiring models that can generate diverse and stable reasoning paths for problem-solving.

Overview

Model Overview

Key Capabilities

Good For

Full Model Card (README)