mimoidochi/OpenRS-GRPO-S

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Mar 16, 2026Architecture:Transformer Warm

mimoidochi/OpenRS-GRPO-S is a 1.5 billion parameter language model, fine-tuned from deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B with a 32768 token context length. This model utilizes the GRPO (Generative Reinforcement learning from Policy Optimization) method, specifically optimized for mathematical reasoning tasks. It is trained on the knoveleng/open-rs dataset, making it suitable for applications requiring robust reasoning capabilities.

Loading preview...

Model Overview

mimoidochi/OpenRS-GRPO-S is a 1.5 billion parameter language model, fine-tuned by mimoidochi from the deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B base model. It features a substantial context length of 32768 tokens, enabling it to process and generate longer sequences of text. The model's training incorporates the GRPO (Generative Reinforcement learning from Policy Optimization) method, a technique introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models". This specialized training, conducted using the TRL framework on the knoveleng/open-rs dataset, aims to enhance the model's capabilities in complex reasoning.

Key Capabilities

  • Mathematical Reasoning: Leverages the GRPO training method, which is designed to improve performance on mathematical and logical reasoning tasks.
  • Extended Context: Supports a 32768 token context window, beneficial for understanding and generating coherent responses over long inputs.
  • Fine-tuned for Specific Data: Trained on the knoveleng/open-rs dataset, suggesting potential strengths in areas related to the dataset's content.

Good For

  • Applications requiring strong mathematical or logical reasoning.
  • Tasks benefiting from a large context window for detailed analysis or generation.
  • Research and development in reinforcement learning from human feedback (RLHF) and policy optimization for language models.