mimoidochi/OpenRS-GRPO-S-2
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Mar 16, 2026Architecture:Transformer Warm

mimoidochi/OpenRS-GRPO-S-2 is a 1.5 billion parameter language model fine-tuned from deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B with a 32K context length. It was trained using the GRPO method, which is designed to enhance mathematical reasoning capabilities. This model is specifically optimized for tasks requiring robust reasoning, particularly in mathematical contexts, leveraging the open-rs dataset.

Loading preview...