knoveleng/Open-RS3

Warm
Public
1.5B
BF16
131072
License: mit
Hugging Face
Overview

Overview

knoveleng's Open-RS3 is a 1.5 billion parameter language model that leverages reinforcement learning (RL) to enhance the reasoning capabilities of the base DeepSeek-R1-Distill-Qwen-1.5B architecture. Developed by Quy-Anh Dang and Chris Ngo, this model focuses on improving mathematical and general reasoning in small LLMs.

Key Capabilities & Performance

  • Enhanced Reasoning: Significantly improves analytical and mathematical reasoning, as evidenced by benchmark scores.
  • Benchmark Performance: Achieves 46.7% on AIME24 and 80.0% on AMC23, outperforming o1-preview (44.6%) on AIME24.
  • Cost-Efficient Training: Trained on 4 A40 GPUs in under 24 hours, costing approximately $42 for 7,000 samples (42,000 total outputs), demonstrating a highly economical approach compared to larger baseline models.
  • Small Footprint: At 1.5B parameters, it offers strong reasoning performance in a resource-constrained package.

Use Cases

Open-RS3 is particularly well-suited for applications requiring robust reasoning and mathematical problem-solving where computational resources are limited. Its efficient training and strong performance in analytical tasks make it a viable option for integrating advanced reasoning into smaller-scale deployments.

For more details, refer to the associated research paper: Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't.