Name: Yukang/Qwen2.5-7B-Open-R1-GRPO API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Yukang

Model Overview

Yukang/Qwen2.5-7B-Open-R1-GRPO is a 7.6 billion parameter language model derived from the Qwen/Qwen2.5-7B-Instruct base model. It has been specifically fine-tuned using the open-r1/OpenR1-Math-220k dataset, focusing on mathematical reasoning.

Key Capabilities

Enhanced Mathematical Reasoning: The model's primary strength lies in its ability to tackle complex mathematical problems, a direct result of its fine-tuning on a specialized math dataset.
GRPO Training Method: It incorporates the GRPO (Gradient-based Reward Policy Optimization) training method, as introduced in the DeepSeekMath paper, to further refine its reasoning abilities.
Large Context Window: With a context length of 131,072 tokens, the model can process and understand extensive problem descriptions and complex mathematical contexts.

Training Details

The model was trained using the TRL library, a framework for Transformer Reinforcement Learning. The GRPO method, which is central to its training, is detailed in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models."

Ideal Use Cases

This model is particularly well-suited for applications requiring robust mathematical problem-solving and reasoning. Developers looking for a model with strong capabilities in areas such as advanced arithmetic, algebra, geometry, and other mathematical domains would find this model beneficial.

Overview

Model Overview

Key Capabilities

Training Details

Ideal Use Cases

Full Model Card (README)