knoveleng/Open-RS2

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Mar 18, 2025License:mitArchitecture:Transformer0.0K Open Weights Warm

The knoveleng/Open-RS2 is a 1.5 billion parameter language model, based on DeepSeek-R1-Distill-Qwen-1.5B, developed by Quy-Anh Dang and Chris Ngo. It is specifically fine-tuned using Reinforcement Learning (RL) to significantly enhance mathematical reasoning capabilities in small LLMs, achieving 80% on AMC23 and 46.7% on AIME24. This model demonstrates a cost-effective approach for improving reasoning in resource-constrained environments, offering competitive performance against larger models. It is optimized for complex reasoning tasks, particularly in mathematics.

Loading preview...

Open-RS2: Enhanced Reasoning in Small LLMs

The knoveleng/Open-RS2 is a 1.5 billion parameter model, part of the Open RS project by Quy-Anh Dang and Chris Ngo, focusing on improving reasoning in small LLMs using Reinforcement Learning (RL). Based on DeepSeek-R1-Distill-Qwen-1.5B, it was trained efficiently on 4 NVIDIA A40 GPUs within 24 hours.

Key Capabilities & Performance

  • Significant Reasoning Improvements: Achieves 80.0% on AMC23 and 46.7% on AIME24, outperforming o1-preview (44.6%) on AIME24.
  • Cost-Efficient Training: RL-based fine-tuning required only 7,000 samples, costing approximately $42, a fraction of the cost for comparable 1.5B and 7B baseline models.
  • Competitive Benchmarks: Scores 55.7% average, with strong performance on mathematical reasoning tasks.

What Makes This Different?

Open-RS2 stands out by demonstrating that advanced reasoning capabilities can be effectively and affordably integrated into small LLMs through targeted RL fine-tuning. It provides a practical solution for deploying powerful reasoning models in resource-limited settings, challenging the notion that only very large models can excel at complex mathematical problems. The project open-sources its code, models, and datasets to foster further research in this area. For more details, refer to the GitHub repository.