SomayJalan/OpenRS-GRPO
SomayJalan/OpenRS-GRPO is a 1.5 billion parameter language model fine-tuned from deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B with a 32768-token context length. It was trained using the GRPO method on the knoveleng/open-rs dataset, specializing in mathematical reasoning and complex problem-solving. This model is optimized for tasks requiring advanced logical deduction and numerical understanding.
Loading preview...
Model Overview
SomayJalan/OpenRS-GRPO is a 1.5 billion parameter language model derived from deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B. It has been specifically fine-tuned using the GRPO (Gradient-based Reward Policy Optimization) method, as introduced in the DeepSeekMath research, on the knoveleng/open-rs dataset. This training approach focuses on enhancing the model's capabilities in mathematical reasoning and complex problem-solving.
Key Capabilities
- Mathematical Reasoning: Leverages the GRPO training method to improve performance on tasks requiring logical and mathematical deduction.
- Fine-tuned Performance: Built upon a robust base model and further optimized for specific reasoning challenges.
- Context Length: Supports a substantial context window of 32768 tokens, allowing for processing longer inputs and complex problem descriptions.
Good For
- Applications requiring strong mathematical and logical reasoning.
- Tasks involving complex problem-solving where detailed understanding and deduction are crucial.
- Research and development in advanced language model fine-tuning techniques, particularly GRPO.