WeiboAI/VibeThinker-1.5B

Warm
Public
1.5B
BF16
32768
1
Nov 4, 2025
License: mit
Hugging Face

VibeThinker-1.5B by WeiboAI is a 1.5 billion parameter dense language model with a 131072 token context length, specifically optimized for competitive-style mathematical reasoning and algorithm coding problems. It achieves reasoning performance comparable to much larger models, surpassing DeepSeek R1 on math benchmarks like AIME24 and HMMT25, and leading Magistral Medium on LiveCodeBench v6. This model is an experimental release focused on exploring advanced reasoning capabilities within small models.

Overview

VibeThinker-1.5B: A Small Model with Big Reasoning

VibeThinker-1.5B, developed by WeiboAI, is a 1.5-billion parameter language model notable for its exceptional reasoning capabilities in competitive mathematics and algorithm coding, achieved with a modest training cost of $7,800 USD. It demonstrates performance comparable to or exceeding models over 400 times its size, such as GPT OSS-20B Medium and DeepSeek R1, on specific reasoning tasks.

Key Capabilities & Performance

  • Mathematical Reasoning: Achieves scores of 80.3 on AIME24, 74.4 on AIME25, and 50.4 on HMMT25, outperforming DeepSeek R1 (79.8, 70.0, 41.7 respectively) despite its significantly smaller size. This extends the Pareto frontier for reasoning accuracy versus model scale.
  • Code Generation: Scores 55.9 on LiveCodeBench v5 and 51.1 on v6, slightly leading Magistral Medium (50.3) on v6, indicating strong algorithmic reasoning.
  • Training Innovation: Utilizes the "Spectrum-to-Signal Principle" (SSP) training framework, which emphasizes solution diversity during Supervised Fine-Tuning (SFT) and reinforces correct signals in the Reinforcement Learning (RL) stage.

Recommended Use Cases

  • Competitive Math Problems: Ideal for tasks found in competitions like AIME and HMMT.
  • Algorithm Coding Challenges: Highly effective for problems similar to those on platforms like LeetCode and Codeforces.

Note: This model is an experimental release focused on exploring reasoning in small models and is primarily recommended for the specified competitive math and coding tasks. For optimal results, it is advised to prompt in English and use recommended generation parameters (temperature: 0.6 or 1.0, max token length: 40960, top_p: 0.95, top_k: -1).