SmallThinker-3B-Preview Overview

SmallThinker-3B-Preview is a 3.1 billion parameter language model developed by PowerInfer, fine-tuned from the Qwen2.5-3B-Instruct architecture. It is specifically engineered to improve mathematical and reasoning performance, as evidenced by its benchmark scores. The model supports a substantial context length of 32768 tokens.

Key Capabilities & Performance

SmallThinker-3B-Preview shows notable improvements over its base model and even GPT-4o in several mathematical and STEM-related benchmarks:

AIME24: Achieves 16.667, significantly higher than Qwen2.5-3B-Instruct (6.67) and GPT-4o (9.3).
GAOKAO2024: Scores 64.2 (Part I) and 57.1 (Part II), outperforming Qwen2.5-3B-Instruct.
MMLU_STEM: Reaches 68.2, surpassing both Qwen2.5-3B-Instruct (59.8) and GPT-4o (64.2).
AMPS_Hard: Scores 70, exceeding GPT-4o's 57.

Intended Use Cases

This model is particularly well-suited for:

Edge Deployment: Its compact size makes it ideal for efficient deployment on devices with limited computational resources, including mobile phones, with support for PowerServe.
Drafting for Larger Models: SmallThinker can function as a fast and efficient draft model for larger language models like QwQ-32B-Preview, offering significant speedups (e.g., 70% faster inference in llama.cpp).

Training Details

The model underwent a two-phase Supervised Fine-Tuning (SFT) process using 8 H100 GPUs. The training utilized the PowerInfer/QWQ-LONGCOT-500K dataset in the first phase, followed by a combination of PowerInfer/QWQ-LONGCOT-500K and PowerInfer/LONGCOT-Refine datasets in the second phase.

Limitations

Users should be aware of the following limitations:

Language: Primarily trained on English datasets, limiting its capabilities in other languages.
Knowledge & Reasoning: Due to its size and limited SFT data, its knowledge base and reasoning capabilities are constrained.
Output Quality: May produce unpredictable or repetitive outputs, especially for high-difficulty questions. Adjusting repetition_penalty is recommended to mitigate repetition.