mimoidochi/OpenRS-GRPO-S
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Mar 16, 2026Architecture:Transformer Warm

mimoidochi/OpenRS-GRPO-S is a 1.5 billion parameter language model, fine-tuned from deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B with a 32768 token context length. This model utilizes the GRPO (Generative Reinforcement learning from Policy Optimization) method, specifically optimized for mathematical reasoning tasks. It is trained on the knoveleng/open-rs dataset, making it suitable for applications requiring robust reasoning capabilities.

Loading preview...