Name: jaygala24/Qwen2.5-3B-GRPO-math-reasoning API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: jaygala24

Model Overview

This model, jaygala24/Qwen2.5-3B-GRPO-math-reasoning, is a specialized fine-tune of the Qwen2.5-3B base model. Its primary distinction lies in its training methodology: it leverages Group Relative Policy Optimization (GRPO) without a KL penalty, a reinforcement learning technique designed to enhance mathematical reasoning capabilities.

Key Capabilities & Training

Mathematical Reasoning: Specifically optimized for solving mathematical problems, as evidenced by its strong performance on relevant benchmarks.
GRPO Fine-tuning: Utilizes a unique RL algorithm (GRPO with KL coefficient 0.0) for policy optimization, trained with PipelineRL.
Dataset Focus: Trained on a combination of gsm8k_train and math_train datasets, ensuring exposure to diverse mathematical problems.
Evaluation: Achieves notable pass@k scores on mathematical reasoning benchmarks:
- GSM8K (test): 84.45% pass@1
- MATH-500: 64.48% pass@1
- Overall: 78.96% pass@1 across 1819 problems.

When to Use This Model

This model is particularly well-suited for applications requiring:

Accurate Mathematical Problem Solving: Ideal for tasks that demand step-by-step reasoning to arrive at a numerical or logical mathematical answer.
Educational Tools: Can be integrated into systems for generating solutions or explanations for math problems.
Research in RL for Reasoning: Provides a strong baseline for exploring the impact of GRPO on complex reasoning tasks.

Its specialized training makes it a robust choice for focused mathematical applications, offering competitive performance within its parameter class.

Overview

Model Overview

Key Capabilities & Training

When to Use This Model

Full Model Card (README)