Name: jaygala24/Qwen2.5-0.5B-DAPO-math-reasoning API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: jaygala24

Model Overview

This model, jaygala24/Qwen2.5-0.5B-DAPO-math-reasoning, is a specialized fine-tuned version of the Qwen2.5-0.5B base model. Its primary focus is on enhancing mathematical reasoning capabilities through advanced reinforcement learning techniques.

Key Capabilities & Training

Mathematical Reasoning: Specifically optimized for solving mathematical problems, as evidenced by its training on gsm8k_train and math_train datasets.
DAPO Algorithm: Fine-tuned using DAPO (Decoupled Clip and Dynamic Sampling Policy Optimization) without KL penalty, an extension of GRPO that incorporates asymmetric PPO clipping, dynamic sampling, token-level loss aggregation, and overlong reward shaping.
Performance: Achieves notable pass@32 scores of 91.36% on GSM8K (test) and 75.00% on MATH-500, with an overall pass@32 of 86.86% across 1819 problems.
Context Length: Supports a sequence length of 8192 tokens during training, indicating its ability to handle complex, multi-step reasoning problems.

Ideal Use Cases

Mathematical Problem Solving: Excellent for applications requiring accurate step-by-step mathematical reasoning and final answer derivation.
Educational Tools: Can be integrated into platforms for teaching or assisting with math homework.
Research in RL for Reasoning: Provides a strong baseline for further research into reinforcement learning applications for complex reasoning tasks.

Overview

Model Overview

Key Capabilities & Training

Ideal Use Cases

Full Model Card (README)