mimoidochi/OpenRS-GRPO-1

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Mar 16, 2026Architecture:Transformer Warm

mimoidochi/OpenRS-GRPO-1 is a 1.5 billion parameter language model fine-tuned from DeepSeek-R1-Distill-Qwen-1.5B. This model was trained using the GRPO method, which is designed to enhance mathematical reasoning capabilities. It is specifically optimized for tasks requiring robust reasoning, leveraging its training on the open-rs dataset. With a 32768 token context length, it is suitable for applications demanding detailed contextual understanding and logical inference.

Loading preview...

Overview

mimoidochi/OpenRS-GRPO-1 is a 1.5 billion parameter language model, fine-tuned from the deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B base model. Its development utilized the knoveleng/open-rs dataset and the TRL framework.

Key Capabilities

  • Enhanced Reasoning: This model was trained using the GRPO (Guided Reasoning Policy Optimization) method, as introduced in the DeepSeekMath paper, which focuses on improving mathematical and general reasoning abilities.
  • Contextual Understanding: Supports a substantial context length of 32768 tokens, allowing for processing and generating longer, more coherent texts.
  • Fine-tuned Performance: Leverages a strong base model and specialized fine-tuning to deliver focused performance on reasoning-intensive tasks.

Training Details

The model's training procedure involved the GRPO method, detailed in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). The training was conducted using the TRL library.

When to Use This Model

This model is particularly well-suited for applications requiring strong logical inference and mathematical reasoning, benefiting from its GRPO-based training. It can be a good choice for tasks where understanding complex relationships and generating reasoned responses are critical.