knoveleng/Open-RS3

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Mar 18, 2025License:mitArchitecture:Transformer0.0K Open Weights Warm

Open-RS3 is a 1.5 billion parameter language model developed by knoveleng, enhancing the reasoning capabilities of the DeepSeek-R1-Distill-Qwen-1.5B architecture through reinforcement learning. This model demonstrates significant gains in mathematical reasoning benchmarks, achieving 46.7% on AIME24 and outperforming larger models like o1-preview. It is optimized for cost-effective reasoning in resource-constrained environments, making it suitable for complex problem-solving tasks.

Loading preview...

Open-RS3: Enhanced Reasoning for Small LLMs

Open-RS3 is a 1.5 billion parameter language model from knoveleng, specifically designed to boost the reasoning capabilities of the DeepSeek-R1-Distill-Qwen-1.5B architecture. It leverages reinforcement learning (RL) to achieve substantial improvements in complex problem-solving, particularly in mathematical domains.

Key Capabilities & Performance

  • Reinforcement Learning for Reasoning: Utilizes an efficient RL approach to enhance reasoning in smaller models.
  • Mathematical Reasoning: Achieves 80.0% on AMC23 and 46.7% on AIME24, surpassing o1-preview's 44.6% on AIME24.
  • Cost-Effective Training: Trained on 4 A40 GPUs in under 24 hours, costing approximately $42 for 7,000 samples (42,000 total outputs), demonstrating high efficiency compared to larger baseline models.

Use Cases

  • Resource-Constrained Environments: Ideal for applications where computational resources are limited but strong reasoning is required.
  • Mathematical Problem Solving: Excels in tasks requiring logical and mathematical deduction.
  • Research in RL for LLMs: Provides a strong baseline and methodology for further research into improving small LLM capabilities through RL.

This model showcases the potential of targeted RL training to unlock advanced reasoning in compact language models, offering a powerful and economical solution for specific analytical tasks. For more details, refer to the GitHub repository and the associated research paper.