Name: jaygala24/Qwen2.5-1.5B-DAPO-math-reasoning API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: jaygala24

jaygala24/Qwen2.5-1.5B-DAPO-math-reasoning Overview

This model is a specialized fine-tuned version of the Qwen2.5-1.5B base model, developed by jaygala24. It utilizes DAPO (Decoupled Clip and Dynamic Sampling Policy Optimization) without KL penalty as its reinforcement learning algorithm, a method that extends GRPO with asymmetric PPO clipping, dynamic sampling, token-level loss aggregation, and overlong reward shaping. The training focused exclusively on mathematical reasoning datasets, including gsm8k_train and math_train.

Key Capabilities & Performance

Mathematical Reasoning: Specifically optimized for complex mathematical problem-solving.
DAPO Fine-tuning: Employs an advanced RL algorithm for enhanced performance in its target domain.
Strong Benchmark Results: Achieves notable pass@k scores on mathematical benchmarks:
- GSM8K (test): 78.78% pass@1, 95.98% pass@32
- MATH-500: 60.22% pass@1, 88.40% pass@32
- Overall: 73.68% pass@1, 93.90% pass@32
Context Length: Supports a substantial context length of 32768 tokens.

Training Details

The model was trained for 1500 steps with a learning rate of 1e-06 and an effective batch size of 256, utilizing DeepSpeed ZeRO Stage 3 for efficiency. The RL algorithm parameters include a clip epsilon of 0.2 and a discount factor of 1.0.

Ideal Use Cases

This model is particularly well-suited for applications requiring:

Automated Mathematical Problem Solving
Educational Tools that need to generate step-by-step mathematical reasoning.
Research and Development in AI for mathematics and reasoning tasks.

Overview

jaygala24/Qwen2.5-1.5B-DAPO-math-reasoning Overview

Key Capabilities & Performance

Training Details

Ideal Use Cases

Full Model Card (README)