Name: jaygala24/Qwen2.5-3B-DAPO-math-reasoning API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: jaygala24

Model Overview

This model, jaygala24/Qwen2.5-3B-DAPO-math-reasoning, is a 3.1 billion parameter Qwen2.5 variant specifically fine-tuned for mathematical reasoning. It leverages DAPO (Decoupled Clip and Dynamic Sampling Policy Optimization) without KL penalty to enhance its problem-solving capabilities, trained using the PipelineRL framework.

Key Capabilities

Advanced Mathematical Reasoning: Optimized for complex arithmetic and algebraic problems, as evidenced by its performance on specialized datasets.
DAPO Fine-tuning: Utilizes an advanced reinforcement learning algorithm that extends GRPO with features like clip-higher (asymmetric PPO clipping) and dynamic sampling for improved policy optimization.
Strong Benchmark Performance: Achieves notable results on mathematical benchmarks:
- GSM8K (test): 86.52% pass@1, 97.50% pass@32
- MATH-500: 70.66% pass@1, 92.00% pass@32
- Overall: 82.16% pass@1, 95.99% pass@32
Efficient Training: Trained with a sequence length of 8192 and an effective batch size of 256, utilizing DeepSpeed ZeRO Stage 3 for efficiency.

Good For

Applications requiring high accuracy in mathematical problem-solving.
Educational tools or systems that need to generate step-by-step mathematical reasoning.
Research into reinforcement learning for language models, particularly DAPO and its effectiveness in specialized domains.

Overview

Model Overview

Key Capabilities

Good For

Full Model Card (README)