Name: jaygala24/Qwen3-1.7B-DAPO-math-reasoning API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: jaygala24

Model Overview

The jaygala24/Qwen3-1.7B-DAPO-math-reasoning model is a specialized 1.7 billion parameter language model built upon the Qwen3 architecture. It has been fine-tuned using a novel DAPO (Decoupled Clip and Dynamic Sampling Policy Optimization) algorithm without KL penalty to enhance its mathematical reasoning abilities. The training utilized the PipelineRL framework and focused on datasets like gsm8k and math_train.

Key Capabilities & Performance

This model excels in mathematical problem-solving, as evidenced by its evaluation results on standard benchmarks:

GSM8K (test): Achieves a pass@32 score of 97.57% and pass@1 of 80.97%.
MATH-500: Reaches a pass@32 score of 91.60% and pass@1 of 65.77%.
Overall: Demonstrates a combined pass@32 of 95.93% across 1819 problems.

The DAPO algorithm, which extends GRPO with features like asymmetric PPO clipping and dynamic sampling, was applied with a sequence length of 8192 tokens during training.

Good For

Mathematical Reasoning: Ideal for tasks requiring step-by-step mathematical problem-solving.
Academic Research: Useful for researchers exploring advanced reinforcement learning techniques like DAPO for fine-tuning LLMs.
Educational Tools: Can be integrated into applications designed to assist with or evaluate mathematical understanding.

Overview

Model Overview

Key Capabilities & Performance

Good For

Full Model Card (README)