Overview
knoveleng's Open-RS3 is a 1.5 billion parameter language model that leverages reinforcement learning (RL) to enhance the reasoning capabilities of the base DeepSeek-R1-Distill-Qwen-1.5B architecture. Developed by Quy-Anh Dang and Chris Ngo, this model focuses on improving mathematical and general reasoning in small LLMs.
Key Capabilities & Performance
- Enhanced Reasoning: Significantly improves analytical and mathematical reasoning, as evidenced by benchmark scores.
- Benchmark Performance: Achieves 46.7% on AIME24 and 80.0% on AMC23, outperforming
o1-preview(44.6%) on AIME24. - Cost-Efficient Training: Trained on 4 A40 GPUs in under 24 hours, costing approximately $42 for 7,000 samples (42,000 total outputs), demonstrating a highly economical approach compared to larger baseline models.
- Small Footprint: At 1.5B parameters, it offers strong reasoning performance in a resource-constrained package.
Use Cases
Open-RS3 is particularly well-suited for applications requiring robust reasoning and mathematical problem-solving where computational resources are limited. Its efficient training and strong performance in analytical tasks make it a viable option for integrating advanced reasoning into smaller-scale deployments.
For more details, refer to the associated research paper: Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't.