Yukang/Qwen2.5-7B-Open-R1-GRPO

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Jun 19, 2025Architecture:Transformer Cold

Yukang/Qwen2.5-7B-Open-R1-GRPO is a 7.6 billion parameter language model, fine-tuned from Qwen/Qwen2.5-7B-Instruct. It leverages the GRPO method and the OpenR1-Math-220k dataset, specifically optimizing its capabilities for advanced mathematical reasoning tasks. This model is designed to enhance performance in complex mathematical problem-solving.

Loading preview...

Model Overview

Yukang/Qwen2.5-7B-Open-R1-GRPO is a 7.6 billion parameter language model derived from the Qwen/Qwen2.5-7B-Instruct base model. It has been specifically fine-tuned using the open-r1/OpenR1-Math-220k dataset, focusing on mathematical reasoning.

Key Capabilities

  • Enhanced Mathematical Reasoning: The model's primary strength lies in its ability to tackle complex mathematical problems, a direct result of its fine-tuning on a specialized math dataset.
  • GRPO Training Method: It incorporates the GRPO (Gradient-based Reward Policy Optimization) training method, as introduced in the DeepSeekMath paper, to further refine its reasoning abilities.
  • Large Context Window: With a context length of 131,072 tokens, the model can process and understand extensive problem descriptions and complex mathematical contexts.

Training Details

The model was trained using the TRL library, a framework for Transformer Reinforcement Learning. The GRPO method, which is central to its training, is detailed in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models."

Ideal Use Cases

This model is particularly well-suited for applications requiring robust mathematical problem-solving and reasoning. Developers looking for a model with strong capabilities in areas such as advanced arithmetic, algebra, geometry, and other mathematical domains would find this model beneficial.