Name: simplescaling/s1-32B API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: simplescaling

Model Overview

simplescaling/s1-32B is a 32 billion parameter language model developed by simplescaling, specifically fine-tuned for reasoning tasks. It is based on the Qwen2.5-32B-Instruct architecture and stands out for its efficient training, utilizing only 1,000 examples to achieve competitive performance.

Key Capabilities and Features

Reasoning Focus: The model is specifically designed and optimized for complex reasoning, as evidenced by its performance on mathematical and general problem-solving benchmarks.
Efficient Training: Achieves strong results with a remarkably small training dataset of just 1,000 examples.
Test-Time Scaling: Incorporates "budget forcing" during evaluation, a technique that enhances its performance on reasoning tasks by allowing for iterative thinking.
Competitive Performance: Benchmarks indicate that s1-32B matches or exceeds the performance of models like o1-preview on metrics such as AIME2024 and MATH500, particularly when budget forcing is applied.

Evaluation Highlights

Evaluations show s1-32B's strong performance in reasoning:

AIME2024: 56.7
MATH500: 93.0
GPQA-Diamond: 59.6

It's important to note that these benchmark results for s1-32B utilize budget forcing, which involves ignoring end-of-thinking and appending "Wait" up to four times to enhance reasoning capabilities. For potentially better performance, users are recommended to consider its successor, s1.1-32B.

Use Cases

This model is particularly well-suited for applications requiring advanced reasoning and problem-solving, especially in domains like mathematics and complex question answering, where its budget forcing mechanism can be leveraged for improved accuracy.

Overview

Model Overview

Key Capabilities and Features

Evaluation Highlights

Use Cases

Full Model Card (README)