Name: od2961/Qwen2.5-1.5B-Open-R1-GRPO-math-v1 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: od2961

Overview

od2961/Qwen2.5-1.5B-Open-R1-GRPO-math-v1 is a 1.5 billion parameter language model, fine-tuned from the Qwen/Qwen2.5-1.5B-Instruct base model. Its primary distinction lies in its specialized training using the GRPO (Gradient-based Reward Policy Optimization) method, as introduced in the DeepSeekMath paper. This model was specifically fine-tuned on the OpenR1-Math-220k dataset, making it highly optimized for mathematical reasoning tasks.

Key Capabilities

Enhanced Mathematical Reasoning: Specialized training on a dedicated math dataset significantly improves its ability to understand and solve mathematical problems.
GRPO Training: Utilizes the GRPO method, a technique designed to push the limits of mathematical reasoning in open language models, as detailed in the DeepSeekMath paper.
Qwen2.5 Architecture: Benefits from the robust architecture of the Qwen2.5 series, providing a strong foundation for its specialized capabilities.

Good For

Applications requiring strong mathematical problem-solving abilities.
Research and development in improving LLM performance on quantitative tasks.
Scenarios where a smaller, specialized model for math is preferred over larger, general-purpose models.

Overview

Overview

Key Capabilities

Good For

Full Model Card (README)