mimoidochi/OpenRS-GRPO
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Mar 10, 2026Architecture:Transformer Warm

mimoidochi/OpenRS-GRPO is a 1.5 billion parameter language model fine-tuned from deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B, featuring a 32768 token context length. It was trained using the GRPO (Gradient-based Reinforcement Learning with Policy Optimization) method, originally introduced for mathematical reasoning tasks. This model is specifically optimized for response generation based on the knoveleng/open-rs dataset, making it suitable for conversational AI and question-answering applications.

Loading preview...