mimoidochi/OpenRS-GRPO-1
mimoidochi/OpenRS-GRPO-1 is a 1.5 billion parameter language model fine-tuned from DeepSeek-R1-Distill-Qwen-1.5B. This model was trained using the GRPO method, which is designed to enhance mathematical reasoning capabilities. It is specifically optimized for tasks requiring robust reasoning, leveraging its training on the open-rs dataset. With a 32768 token context length, it is suitable for applications demanding detailed contextual understanding and logical inference.
Loading preview...
Overview
mimoidochi/OpenRS-GRPO-1 is a 1.5 billion parameter language model, fine-tuned from the deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B base model. Its development utilized the knoveleng/open-rs dataset and the TRL framework.
Key Capabilities
- Enhanced Reasoning: This model was trained using the GRPO (Guided Reasoning Policy Optimization) method, as introduced in the DeepSeekMath paper, which focuses on improving mathematical and general reasoning abilities.
- Contextual Understanding: Supports a substantial context length of 32768 tokens, allowing for processing and generating longer, more coherent texts.
- Fine-tuned Performance: Leverages a strong base model and specialized fine-tuning to deliver focused performance on reasoning-intensive tasks.
Training Details
The model's training procedure involved the GRPO method, detailed in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). The training was conducted using the TRL library.
When to Use This Model
This model is particularly well-suited for applications requiring strong logical inference and mathematical reasoning, benefiting from its GRPO-based training. It can be a good choice for tasks where understanding complex relationships and generating reasoned responses are critical.