Name: jaygala24/Qwen2.5-0.5B-GRPO-math-reasoning API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: jaygala24

Overview

This model, jaygala24/Qwen2.5-0.5B-GRPO-math-reasoning, is a specialized fine-tune of the Qwen2.5-0.5B base model. Developed by jaygala24, its core differentiator is the application of Group Relative Policy Optimization (GRPO) without a KL penalty for enhanced mathematical reasoning. The training utilized the PipelineRL framework and focused on datasets such as gsm8k_train and math_train.

Key Capabilities

Mathematical Reasoning: Specifically fine-tuned to excel in solving mathematical problems, as evidenced by its training on GSM8K and MATH-500 datasets.
GRPO Optimization: Leverages a unique reinforcement learning approach (GRPO with group mean reward as baseline) to improve policy performance in reasoning tasks.
Compact Size: At 0.5 billion parameters, it offers a relatively small footprint while delivering competitive performance in its specialized domain.

Evaluation Highlights

The model achieved notable pass@k scores on mathematical benchmarks:

GSM8K (test): 51.77% pass@1, 89.76% pass@32
MATH-500: 31.18% pass@1, 73.00% pass@32
Overall: 46.11% pass@1, 85.16% pass@32

Good for

Mathematical Problem Solving: Ideal for applications requiring accurate step-by-step mathematical reasoning.
Educational Tools: Can be integrated into systems designed to assist with or evaluate math homework and exercises.
Research in RL for Reasoning: Provides a practical example of GRPO's application in fine-tuning LLMs for specific cognitive tasks.

Overview

Overview

Key Capabilities

Evaluation Highlights

Good for

Full Model Card (README)