daviddavidlu/PrAg-PO-Qwen3-1.7b-step720

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:May 12, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

The daviddavidlu/PrAg-PO-Qwen3-1.7b-step720 is a 1.7 billion parameter Qwen3-based language model, specifically a checkpoint from training on the MATH Level-3-to-5 Dataset. Developed by daviddavidlu, this model utilizes Prompt Augmented Policy Optimization (PrAg-PO) to enhance mathematical reasoning. It is optimized for robust and diverse mathematical problem-solving by generating reasoning traces under varied templates.

Loading preview...

Model Overview

The daviddavidlu/PrAg-PO-Qwen3-1.7b-step720 is a 1.7 billion parameter model based on the Qwen3 architecture, representing a specific checkpoint (step 720) from its training process. This model is a product of research detailed in the paper "PrAg-PO: Prompt Augmented Policy Optimization for Robust and Diverse Mathematical Reasoning" by Wenquan Lu, Hai Huang, Enqi Liu, and Randall Balestriero.

Key Capabilities

  • Mathematical Reasoning: The model is specifically trained and intended for advanced mathematical reasoning tasks, particularly on problems from the MATH Level-3-to-5 Dataset.
  • Prompt Augmented Policy Optimization (PrAg-PO): It incorporates a novel training procedure that leverages prompt augmentation to generate diverse reasoning traces. This technique aims to increase rollout diversity and stability during reinforcement learning (RL) training.
  • Robust Problem Solving: By using diverse templates for reasoning traces, the model is designed to achieve more robust and varied approaches to solving mathematical problems.

Good For

  • Research in Mathematical AI: Ideal for researchers exploring advanced techniques in mathematical reasoning and reinforcement learning for language models.
  • Benchmarking Mathematical Performance: Can be used to evaluate and compare the performance of LLMs on complex mathematical datasets like MATH Level-3-to-5.
  • Developing Robust Reasoning Systems: Suitable for applications requiring models that can generate diverse and stable reasoning paths for problem-solving.