GyunYeop/OpenRS-GRPO

TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:May 31, 2026Architecture:Transformer Cold

GyunYeop/OpenRS-GRPO is a 1.5 billion parameter language model fine-tuned from deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B, utilizing the GRPO (Generative Reinforcement learning with Policy Optimization) method. This model is specifically optimized for mathematical reasoning tasks, leveraging techniques introduced in the DeepSeekMath research. With a 32768-token context length, it is designed for applications requiring robust mathematical problem-solving capabilities.

Loading preview...

OpenRS-GRPO: Mathematical Reasoning with GRPO

OpenRS-GRPO is a 1.5 billion parameter language model developed by GyunYeop, fine-tuned from the deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B base model. This model distinguishes itself by its training methodology, which incorporates GRPO (Generative Reinforcement learning with Policy Optimization).

Key Capabilities & Differentiators

  • Mathematical Reasoning: The core strength of OpenRS-GRPO lies in its optimization for mathematical reasoning tasks, directly applying the GRPO method detailed in the DeepSeekMath paper.
  • Reinforcement Learning Fine-tuning: Trained using the TRL library, it leverages reinforcement learning techniques to enhance performance in specific domains.
  • Extended Context Window: Features a substantial context length of 32768 tokens, allowing for processing longer and more complex problem descriptions.

When to Use This Model

  • Mathematical Problem Solving: Ideal for applications requiring advanced mathematical reasoning, calculations, and problem-solving.
  • Research in RLHF: Useful for researchers exploring the impact of GRPO and similar reinforcement learning techniques on language model capabilities.
  • Resource-Efficient Math AI: Offers specialized mathematical capabilities within a 1.5B parameter footprint, making it suitable for scenarios where larger models might be overkill or overkill.