Open-RS3: Enhanced Reasoning for Small LLMs
Open-RS3 is a 1.5 billion parameter language model from knoveleng, specifically designed to boost the reasoning capabilities of the DeepSeek-R1-Distill-Qwen-1.5B architecture. It leverages reinforcement learning (RL) to achieve substantial improvements in complex problem-solving, particularly in mathematical domains.
Key Capabilities & Performance
- Reinforcement Learning for Reasoning: Utilizes an efficient RL approach to enhance reasoning in smaller models.
- Mathematical Reasoning: Achieves 80.0% on AMC23 and 46.7% on AIME24, surpassing
o1-preview's 44.6% on AIME24. - Cost-Effective Training: Trained on 4 A40 GPUs in under 24 hours, costing approximately $42 for 7,000 samples (42,000 total outputs), demonstrating high efficiency compared to larger baseline models.
Use Cases
- Resource-Constrained Environments: Ideal for applications where computational resources are limited but strong reasoning is required.
- Mathematical Problem Solving: Excels in tasks requiring logical and mathematical deduction.
- Research in RL for LLMs: Provides a strong baseline and methodology for further research into improving small LLM capabilities through RL.
This model showcases the potential of targeted RL training to unlock advanced reasoning in compact language models, offering a powerful and economical solution for specific analytical tasks. For more details, refer to the GitHub repository and the associated research paper.