SomayJalan/OpenRS-GRPO

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Nov 10, 2025Architecture:Transformer Warm

SomayJalan/OpenRS-GRPO is a 1.5 billion parameter language model fine-tuned from deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B with a 32768-token context length. It was trained using the GRPO method on the knoveleng/open-rs dataset, specializing in mathematical reasoning and complex problem-solving. This model is optimized for tasks requiring advanced logical deduction and numerical understanding.

Loading preview...

Model Overview

SomayJalan/OpenRS-GRPO is a 1.5 billion parameter language model derived from deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B. It has been specifically fine-tuned using the GRPO (Gradient-based Reward Policy Optimization) method, as introduced in the DeepSeekMath research, on the knoveleng/open-rs dataset. This training approach focuses on enhancing the model's capabilities in mathematical reasoning and complex problem-solving.

Key Capabilities

  • Mathematical Reasoning: Leverages the GRPO training method to improve performance on tasks requiring logical and mathematical deduction.
  • Fine-tuned Performance: Built upon a robust base model and further optimized for specific reasoning challenges.
  • Context Length: Supports a substantial context window of 32768 tokens, allowing for processing longer inputs and complex problem descriptions.

Good For

  • Applications requiring strong mathematical and logical reasoning.
  • Tasks involving complex problem-solving where detailed understanding and deduction are crucial.
  • Research and development in advanced language model fine-tuning techniques, particularly GRPO.