Tiiny/SmallThinker-3B-Preview
SmallThinker-3B-Preview is a 3.1 billion parameter instruction-tuned causal language model developed by Tiiny, fine-tuned from Qwen2.5-3B-Instruct with a 32768 token context length. It demonstrates enhanced mathematical and reasoning capabilities, outperforming its base model and GPT-4o on several benchmarks like AIME24 and GAOKAO2024. This model is primarily optimized for efficient edge deployment on resource-constrained devices and can serve as a fast draft model for larger language models.
Loading preview...
SmallThinker-3B-Preview Overview
SmallThinker-3B-Preview is a 3.1 billion parameter language model, fine-tuned from the Qwen2.5-3B-Instruct architecture. It features a substantial 32768 token context length, making it suitable for processing longer inputs. The model was developed through a two-phase Supervised Fine-Tuning (SFT) process using 8 H100 GPUs, leveraging datasets like PowerInfer/QWQ-LONGCOT-500K and PowerInfer/LONGCOT-Refine.
Key Capabilities & Performance
SmallThinker-3B-Preview shows significant improvements in mathematical and reasoning tasks compared to its base model, Qwen2.5-3B-Instruct, and even surpasses GPT-4o on specific benchmarks. For instance, it achieves 16.667 on AIME24 (vs. 6.67 for Qwen2.5-3B-Instruct and 9.3 for GPT-4o) and 68.2 on MMLU_STEM (vs. 59.8 for Qwen2.5-3B-Instruct and 64.2 for GPT-4o). It also scores 70 on AMPS_Hard and 46.8 on math_comp, indicating strong performance in complex problem-solving.
Ideal Use Cases
- Edge Deployment: Its compact size makes it highly efficient for deployment on devices with limited computational resources.
- Draft Model: SmallThinker can function as a rapid and efficient draft model for larger language models, such as QwQ-32B-Preview, offering significant speedups (e.g., 70% faster in llama.cpp).
Limitations
Currently, SmallThinker-3B-Preview has limitations including English-only language support, constrained reasoning due to its size and SFT data, and potential for unpredictable or repetitive outputs, especially with high-difficulty questions. Users may need to adjust repetition_penalty to mitigate repetition issues.