knoveleng/Open-RS1

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Mar 18, 2025License:mitArchitecture:Transformer0.0K Open Weights Warm

Open-RS1 is a 1.5 billion parameter language model developed by knoveleng, based on DeepSeek-R1-Distill-Qwen-1.5B, with a 32768 token context length. It is specifically fine-tuned using reinforcement learning (GRPO algorithm) on a compact mathematical reasoning dataset to significantly enhance reasoning capabilities in small LLMs. This model demonstrates strong performance on mathematical benchmarks like AMC23 and AIME24, offering a cost-effective solution for reasoning tasks in resource-constrained environments.

Loading preview...

Open-RS1: Enhanced Reasoning for Small LLMs

Open-RS1 is a 1.5 billion parameter model developed by knoveleng, focusing on improving reasoning capabilities in small language models through reinforcement learning (RL). It is based on DeepSeek-R1-Distill-Qwen-1.5B and utilizes an adapted Group Relative Policy Optimization (GRPO) algorithm.

Key Capabilities & Performance

  • Significant Reasoning Improvements: Achieves 80% accuracy on AMC23 and 46.7% on AIME24, outperforming larger models like o1-preview.
  • Cost-Efficient Training: Trained with only 7,000 samples at an estimated cost of $42 on 4x NVIDIA A40 GPUs within 24 hours, demonstrating substantial cost savings compared to other 1.5B and 7B models.
  • Resource-Constrained Optimization: Designed to make advanced reasoning accessible in environments with limited computational resources.

Ideal Use Cases

  • Mathematical Reasoning: Excels in tasks requiring logical and mathematical problem-solving.
  • Cost-Sensitive Applications: Suitable for projects where training and inference costs are a primary concern.
  • Research in RL for LLMs: Provides an open-source foundation for further exploration into reinforcement learning techniques for language models.