sonicdog00/OpenRS-GRPO

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Mar 5, 2026Architecture:Transformer Warm

OpenRS-GRPO is a fine-tuned language model developed by sonicdog00, based on the Qwen2.5-3B-Instruct architecture. It was trained using the TRL framework and the knoveleng/open-rs dataset, specifically incorporating the GRPO method from the DeepSeekMath paper. This model is optimized for mathematical reasoning and complex problem-solving, making it suitable for tasks requiring advanced logical deduction.

Loading preview...

OpenRS-GRPO: Enhanced Mathematical Reasoning

OpenRS-GRPO is a specialized language model developed by sonicdog00, fine-tuned from the Qwen2.5-3B-Instruct base model. It leverages the TRL (Transformer Reinforcement Learning) framework and was trained on the knoveleng/open-rs dataset.

Key Capabilities

  • Advanced Mathematical Reasoning: Integrates the GRPO (Gradient-based Reward Policy Optimization) method, as introduced in the DeepSeekMath paper, to enhance its ability to handle complex mathematical problems and logical deductions.
  • Instruction Following: Inherits strong instruction-following capabilities from its Qwen2.5-3B-Instruct base.

Good for

  • Applications requiring robust mathematical problem-solving.
  • Tasks involving logical reasoning and complex question answering.
  • Research and development in improving LLM performance on quantitative tasks.