knoveleng/Open-RS1
Open-RS1 is a 1.5 billion parameter language model developed by knoveleng, based on DeepSeek-R1-Distill-Qwen-1.5B, with a 32768 token context length. It is specifically fine-tuned using reinforcement learning (GRPO algorithm) on a compact mathematical reasoning dataset to significantly enhance reasoning capabilities in small LLMs. This model demonstrates strong performance on mathematical benchmarks like AMC23 and AIME24, offering a cost-effective solution for reasoning tasks in resource-constrained environments.
Loading preview...
Open-RS1: Enhanced Reasoning for Small LLMs
Open-RS1 is a 1.5 billion parameter model developed by knoveleng, focusing on improving reasoning capabilities in small language models through reinforcement learning (RL). It is based on DeepSeek-R1-Distill-Qwen-1.5B and utilizes an adapted Group Relative Policy Optimization (GRPO) algorithm.
Key Capabilities & Performance
- Significant Reasoning Improvements: Achieves 80% accuracy on AMC23 and 46.7% on AIME24, outperforming larger models like
o1-preview. - Cost-Efficient Training: Trained with only 7,000 samples at an estimated cost of $42 on 4x NVIDIA A40 GPUs within 24 hours, demonstrating substantial cost savings compared to other 1.5B and 7B models.
- Resource-Constrained Optimization: Designed to make advanced reasoning accessible in environments with limited computational resources.
Ideal Use Cases
- Mathematical Reasoning: Excels in tasks requiring logical and mathematical problem-solving.
- Cost-Sensitive Applications: Suitable for projects where training and inference costs are a primary concern.
- Research in RL for LLMs: Provides an open-source foundation for further exploration into reinforcement learning techniques for language models.