Name: jaygala24/Qwen2.5-0.5B-GRPO-KL-math-reasoning API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: jaygala24

Overview

This model, jaygala24/Qwen2.5-0.5B-GRPO-KL-math-reasoning, is a specialized fine-tuned version of the Qwen2.5-0.5B base model. Its primary focus is on mathematical reasoning, achieved through a unique training methodology.

Key Capabilities & Training

Mathematical Reasoning: The model is specifically fine-tuned for mathematical problem-solving.
GRPO with KL Penalty: It leverages Group Relative Policy Optimization (GRPO), an advanced reinforcement learning algorithm, combined with a KL penalty for training. This method uses the group mean reward as a baseline for relative advantages.
Targeted Datasets: Training involved gsm8k_train and math_train datasets, with evaluation on gsm8k_test and math_500.
Performance: Achieves an overall pass@1 score of 43.62% and pass@32 of 83.01% across GSM8K and MATH-500 benchmarks, demonstrating strong capabilities in generating correct mathematical solutions given multiple attempts.
Efficient Training: Trained using bf16 precision and DeepSpeed ZeRO Stage 3 for optimized resource utilization.

Good For

Applications requiring robust mathematical problem-solving.
Research into reinforcement learning techniques for language models, particularly GRPO.
Developing agents that need to reason step-by-step through mathematical challenges.

Overview

Overview

Key Capabilities & Training

Good For

Full Model Card (README)