Overview
mimoidochi/OpenRS-GRPO-1 is a 1.5 billion parameter language model, fine-tuned from the deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B base model. Its development utilized the knoveleng/open-rs dataset and the TRL framework.
Key Capabilities
- Enhanced Reasoning: This model was trained using the GRPO (Guided Reasoning Policy Optimization) method, as introduced in the DeepSeekMath paper, which focuses on improving mathematical and general reasoning abilities.
- Contextual Understanding: Supports a substantial context length of 32768 tokens, allowing for processing and generating longer, more coherent texts.
- Fine-tuned Performance: Leverages a strong base model and specialized fine-tuning to deliver focused performance on reasoning-intensive tasks.
Training Details
The model's training procedure involved the GRPO method, detailed in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). The training was conducted using the TRL library.
When to Use This Model
This model is particularly well-suited for applications requiring strong logical inference and mathematical reasoning, benefiting from its GRPO-based training. It can be a good choice for tasks where understanding complex relationships and generating reasoned responses are critical.