Name: cheongmyeong17/Qwen2.5-MATH-1.5B-GRPO-Best API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: cheongmyeong17

Model Overview

cheongmyeong17/Qwen2.5-MATH-1.5B-GRPO-Best is a 1.5 billion parameter language model derived from Qwen/Qwen2.5-Math-1.5B-Instruct. This model has been specifically fine-tuned for enhanced mathematical reasoning capabilities.

Key Capabilities

Mathematical Reasoning: Optimized for solving mathematical problems, leveraging training on the hendrycks-math-with-answers dataset.
GRPO Training: Utilizes the GRPO (Gradient-based Reward Policy Optimization) method, as introduced in the DeepSeekMath paper, to improve performance in mathematical contexts.
Instruction Following: Inherits instruction-following capabilities from its base model, Qwen2.5-Math-1.5B-Instruct.

Training Details

The model was trained using the TRL (Transformer Reinforcement Learning) framework. The training procedure involved fine-tuning with GRPO, a technique designed to push the limits of mathematical reasoning in language models. This specialized training makes it particularly adept at handling mathematical queries and problem-solving.

Overview

Model Overview

Key Capabilities

Training Details

Full Model Card (README)