mimoidochi/OpenRS-GRPO-S-2
mimoidochi/OpenRS-GRPO-S-2 is a 1.5 billion parameter language model fine-tuned from deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B with a 32K context length. It was trained using the GRPO method, which is designed to enhance mathematical reasoning capabilities. This model is specifically optimized for tasks requiring robust reasoning, particularly in mathematical contexts, leveraging the open-rs dataset.
Loading preview...
Model Overview
mimoidochi/OpenRS-GRPO-S-2 is a 1.5 billion parameter language model built upon the deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B architecture. It has been fine-tuned on the knoveleng/open-rs dataset, which is likely geared towards reasoning tasks.
Key Capabilities
- Enhanced Reasoning: The model was trained using the GRPO (Gradient-based Reasoning Policy Optimization) method, as introduced in the DeepSeekMath paper, indicating a focus on improving reasoning abilities.
- Mathematical Proficiency: Given its training with GRPO, this model is particularly suited for tasks that involve mathematical reasoning and problem-solving.
- Extended Context: It supports a context length of 32,768 tokens, allowing it to process and generate longer sequences of text.
Training Details
The model's fine-tuning process utilized the TRL library. The GRPO method, a key component of its training, is detailed in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300).
Ideal Use Cases
This model is a strong candidate for applications requiring:
- Complex reasoning tasks.
- Mathematical problem-solving and generation.
- Processing long documents or conversations where extended context is beneficial.