RUC-AIBOX/STILL-3-1.5B-preview

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Jan 25, 2025Architecture:Transformer0.0K Warm

RUC-AIBOX/STILL-3-1.5B-preview is a 1.5 billion parameter slow-thinking reasoning model developed by RUC-AIBOX, designed to enhance mathematical and complex problem-solving abilities. It achieves 39.33% accuracy on the AIME benchmark, representing a 37.18% relative improvement over its backbone model. This model is optimized for tasks requiring deliberate reasoning, such as advanced mathematics, and utilizes reinforcement learning for continuous performance gains. It features a context length of 131072 tokens, making it suitable for detailed analytical problems.

Loading preview...

RUC-AIBOX/STILL-3-1.5B-preview: Enhanced Slow-Thinking Reasoning

RUC-AIBOX/STILL-3-1.5B-preview is a 1.5 billion parameter model specifically engineered to improve slow-thinking reasoning capabilities, particularly in mathematical and complex problem-solving domains. Developed by RUC-AIBOX, this model leverages reinforcement learning to achieve continuous performance improvements.

Key Capabilities & Performance

  • Mathematical Reasoning: Achieves a notable 39.33% accuracy on the AIME benchmark, marking a 37.18% relative improvement compared to its backbone model (28.67%).
  • Benchmark Scores: Demonstrates strong performance across multiple mathematical and reasoning benchmarks:
    • MATH: 85.48% (from 84.04%)
    • AIME: 39.33% (from 28.67%)
    • OMNI: 33.00% (from 25.60%)
    • LiveAOPS: 39.50% (from 33.33%)
  • Long Context: Supports a substantial context length of 131072 tokens, enabling it to process and reason over extensive problem descriptions.
  • Training Methodology: Performance gains are attributed to an adaptive reinforcement learning approach, with observed improvements correlating with increased training steps.

Ideal Use Cases

  • Advanced Mathematics: Excels in tasks requiring precise mathematical reasoning, such as those found in the AIME and MATH benchmarks.
  • Complex Problem Solving: Suitable for scenarios where deliberate, step-by-step reasoning is more critical than rapid, intuitive responses.
  • Research & Development: The model, code, and data are open-sourced to facilitate further research into enhancing slow-thinking abilities in smaller language models. More details can be found in the Slow Thinking with LLMs GitHub repository.