simplescaling/s1-32B

Warm
Public
32B
FP8
32768
Jan 14, 2025
License: apache-2.0
Hugging Face
Overview

Model Overview

simplescaling/s1-32B is a 32 billion parameter language model developed by simplescaling, specifically fine-tuned for reasoning tasks. It is based on the Qwen2.5-32B-Instruct architecture and stands out for its efficient training, utilizing only 1,000 examples to achieve competitive performance.

Key Capabilities and Features

  • Reasoning Focus: The model is specifically designed and optimized for complex reasoning, as evidenced by its performance on mathematical and general problem-solving benchmarks.
  • Efficient Training: Achieves strong results with a remarkably small training dataset of just 1,000 examples.
  • Test-Time Scaling: Incorporates "budget forcing" during evaluation, a technique that enhances its performance on reasoning tasks by allowing for iterative thinking.
  • Competitive Performance: Benchmarks indicate that s1-32B matches or exceeds the performance of models like o1-preview on metrics such as AIME2024 and MATH500, particularly when budget forcing is applied.

Evaluation Highlights

Evaluations show s1-32B's strong performance in reasoning:

  • AIME2024: 56.7
  • MATH500: 93.0
  • GPQA-Diamond: 59.6

It's important to note that these benchmark results for s1-32B utilize budget forcing, which involves ignoring end-of-thinking and appending "Wait" up to four times to enhance reasoning capabilities. For potentially better performance, users are recommended to consider its successor, s1.1-32B.

Use Cases

This model is particularly well-suited for applications requiring advanced reasoning and problem-solving, especially in domains like mathematics and complex question answering, where its budget forcing mechanism can be leveraged for improved accuracy.