mimoidochi/OpenRS-GRPO

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Mar 10, 2026Architecture:Transformer Warm

mimoidochi/OpenRS-GRPO is a 1.5 billion parameter language model fine-tuned from deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B, featuring a 32768 token context length. It was trained using the GRPO (Gradient-based Reinforcement Learning with Policy Optimization) method, originally introduced for mathematical reasoning tasks. This model is specifically optimized for response generation based on the knoveleng/open-rs dataset, making it suitable for conversational AI and question-answering applications.

Loading preview...

Model Overview

mimoidochi/OpenRS-GRPO is a 1.5 billion parameter language model built upon the deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B architecture. It has been fine-tuned on the knoveleng/open-rs dataset using the TRL library.

Key Training Methodology

A distinguishing feature of this model is its training with GRPO (Gradient-based Reinforcement Learning with Policy Optimization). This method, detailed in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300), suggests an optimization for reasoning capabilities. The training process utilized TRL version 0.14.0, Transformers 4.49.0, Pytorch 2.5.1, Datasets 4.5.0, and Tokenizers 0.21.4.

Use Cases

Given its fine-tuning on the open-rs dataset, OpenRS-GRPO is particularly well-suited for:

  • Conversational AI: Generating coherent and contextually relevant responses in dialogue systems.
  • Question Answering: Providing detailed answers to user queries.
  • General Text Generation: Creating human-like text based on prompts, leveraging its reasoning-oriented training.