SmallThinker-3B-Preview Overview
SmallThinker-3B-Preview is a 3.1 billion parameter language model developed by PowerInfer, fine-tuned from the Qwen2.5-3B-Instruct architecture. It is specifically engineered to improve mathematical and reasoning performance, as evidenced by its benchmark scores. The model supports a substantial context length of 32768 tokens.
Key Capabilities & Performance
SmallThinker-3B-Preview shows notable improvements over its base model and even GPT-4o in several mathematical and STEM-related benchmarks:
- AIME24: Achieves 16.667, significantly higher than Qwen2.5-3B-Instruct (6.67) and GPT-4o (9.3).
- GAOKAO2024: Scores 64.2 (Part I) and 57.1 (Part II), outperforming Qwen2.5-3B-Instruct.
- MMLU_STEM: Reaches 68.2, surpassing both Qwen2.5-3B-Instruct (59.8) and GPT-4o (64.2).
- AMPS_Hard: Scores 70, exceeding GPT-4o's 57.
Intended Use Cases
This model is particularly well-suited for:
- Edge Deployment: Its compact size makes it ideal for efficient deployment on devices with limited computational resources, including mobile phones, with support for PowerServe.
- Drafting for Larger Models: SmallThinker can function as a fast and efficient draft model for larger language models like QwQ-32B-Preview, offering significant speedups (e.g., 70% faster inference in llama.cpp).
Training Details
The model underwent a two-phase Supervised Fine-Tuning (SFT) process using 8 H100 GPUs. The training utilized the PowerInfer/QWQ-LONGCOT-500K dataset in the first phase, followed by a combination of PowerInfer/QWQ-LONGCOT-500K and PowerInfer/LONGCOT-Refine datasets in the second phase.
Limitations
Users should be aware of the following limitations:
- Language: Primarily trained on English datasets, limiting its capabilities in other languages.
- Knowledge & Reasoning: Due to its size and limited SFT data, its knowledge base and reasoning capabilities are constrained.
- Output Quality: May produce unpredictable or repetitive outputs, especially for high-difficulty questions. Adjusting
repetition_penaltyis recommended to mitigate repetition.