agentica-org/DeepScaleR-1.5B-Preview

Warm
Public
1.5B
BF16
131072
License: mit
Hugging Face
Overview

DeepScaleR-1.5B-Preview Overview

DeepScaleR-1.5B-Preview is a 1.5 billion parameter language model from agentica-org, fine-tuned from DeepSeek-R1-Distilled-Qwen-1.5B. Its core innovation lies in its application of distributed reinforcement learning (RL) to scale up to long context lengths, specifically targeting mathematical reasoning tasks. The model achieves a notable 43.1% Pass@1 accuracy on AIME 2024, representing a 15% improvement over its base model and outperforming OpenAI's O1-Preview.

Key Capabilities & Training

  • Mathematical Reasoning: Excels in solving complex mathematical problems, as evidenced by its strong performance on AIME, MATH, and AMC benchmarks.
  • Reinforcement Learning: Utilizes Deepseek's Group Relative Policy Optimization (GRPO), an extension of PPO, with a simple reward function (1 for correct, 0 for incorrect answers).
  • Iterative Context Lengthening: Employs a cost-effective training strategy that progressively increases context length from 8K to 24K tokens as the model improves, optimizing compute resources.
  • Data: Trained on approximately 40,000 unique problem-answer pairs from AIME, AMC, Omni-MATH, and Still datasets.

Performance Highlights

DeepScaleR-1.5B-Preview demonstrates leading performance among 1.5B and 7B parameter models on several mathematical benchmarks:

  • AIME 2024: 43.1% Pass@1 (compared to 40.0% for O1-Preview and 28.8% for its base model).
  • MATH 500: 87.8% Pass@1.
  • AMC 2023: 73.6% Pass@1.

Good For

  • Applications requiring strong mathematical problem-solving capabilities.
  • Research into efficient reinforcement learning for language models.
  • Use cases where a smaller, highly specialized model for reasoning is preferred over larger, general-purpose LLMs.