mimoidochi/OpenRS-GRPO-S-2

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Mar 16, 2026Architecture:Transformer Warm

mimoidochi/OpenRS-GRPO-S-2 is a 1.5 billion parameter language model fine-tuned from deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B with a 32K context length. It was trained using the GRPO method, which is designed to enhance mathematical reasoning capabilities. This model is specifically optimized for tasks requiring robust reasoning, particularly in mathematical contexts, leveraging the open-rs dataset.

Loading preview...

Model Overview

mimoidochi/OpenRS-GRPO-S-2 is a 1.5 billion parameter language model built upon the deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B architecture. It has been fine-tuned on the knoveleng/open-rs dataset, which is likely geared towards reasoning tasks.

Key Capabilities

  • Enhanced Reasoning: The model was trained using the GRPO (Gradient-based Reasoning Policy Optimization) method, as introduced in the DeepSeekMath paper, indicating a focus on improving reasoning abilities.
  • Mathematical Proficiency: Given its training with GRPO, this model is particularly suited for tasks that involve mathematical reasoning and problem-solving.
  • Extended Context: It supports a context length of 32,768 tokens, allowing it to process and generate longer sequences of text.

Training Details

The model's fine-tuning process utilized the TRL library. The GRPO method, a key component of its training, is detailed in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300).

Ideal Use Cases

This model is a strong candidate for applications requiring:

  • Complex reasoning tasks.
  • Mathematical problem-solving and generation.
  • Processing long documents or conversations where extended context is beneficial.