xiwenc1/OpenRS-DR_GRPO_dra-qwen2

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:3.1BQuant:BF16Ctx Length:32kPublished:Dec 26, 2025Architecture:Transformer Warm

The xiwenc1/OpenRS-DR_GRPO_dra-qwen2 model is a 3.1 billion parameter instruction-tuned language model based on the Qwen2.5-3B-Instruct architecture. It has been fine-tuned using the GRPO method on the knoveleng/open-rs dataset, specializing it for enhanced mathematical reasoning. This model is designed for tasks requiring robust problem-solving capabilities, particularly in mathematical contexts, and supports a 32768-token context length.

Loading preview...

Model Overview

xiwenc1/OpenRS-DR_GRPO_dra-qwen2 is a 3.1 billion parameter language model, fine-tuned from the Qwen2.5-3B-Instruct base model. This model leverages the GRPO (Gradient-based Reward Policy Optimization) method, as introduced in the "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" paper, to enhance its reasoning capabilities.

Key Capabilities

  • Mathematical Reasoning: Specialized through GRPO training, making it suitable for tasks requiring logical and mathematical problem-solving.
  • Instruction Following: Inherits strong instruction-following abilities from its Qwen2.5-3B-Instruct base.
  • Extended Context: Supports a substantial context length of 32768 tokens, allowing for processing longer inputs and more complex problem descriptions.

Training Details

The model was fine-tuned on the knoveleng/open-rs dataset using the TRL (Transformer Reinforcement Learning) framework. The application of the GRPO method specifically targets improvements in mathematical reasoning, differentiating it from general-purpose instruction-tuned models.

Good For

  • Applications requiring advanced mathematical problem-solving.
  • Tasks where robust reasoning and logical deduction are critical.
  • Developers looking for a compact yet capable model for specialized reasoning tasks.