Name: Thomas-Chou/Qwen2.5-1.5B-Open-R1-GRPO API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Thomas-Chou

Model Overview

Thomas-Chou/Qwen2.5-1.5B-Open-R1-GRPO is a 1.5 billion parameter language model derived from the Qwen/Qwen2.5-1.5B-Instruct architecture. Its primary distinction lies in its specialized fine-tuning for mathematical reasoning tasks.

Key Capabilities

Enhanced Mathematical Reasoning: The model has been fine-tuned on the OpenR1-Math-220k dataset, specifically targeting mathematical problem-solving.
GRPO Training Method: It utilizes the GRPO (Gradient Regularized Policy Optimization) training method, as detailed in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300), to improve its mathematical capabilities.
Large Context Window: Inherits a substantial context length of 131072 tokens, beneficial for complex multi-step reasoning.

Good For

Mathematical Problem Solving: Ideal for applications requiring robust mathematical reasoning, calculations, and understanding of mathematical concepts.
Research and Development: Useful for researchers exploring advanced fine-tuning techniques like GRPO for domain-specific performance enhancement.

This model was developed using the TRL framework (version 0.18.0) and is a focused adaptation of the base Qwen2.5-1.5B-Instruct model.

Overview

Model Overview

Key Capabilities

Good For

Full Model Card (README)