WeiboAI/VibeThinker-3B

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:3.1BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:Jun 12, 2026License:mitArchitecture:Transformer0.2K Open Weights Warm

WeiboAI/VibeThinker-3B is a 3 billion parameter model from the VibeThinker series, specifically designed for challenging reasoning tasks such as mathematics, coding, and STEM. It utilizes the Spectrum-to-Signal Principle (SSP) post-training pipeline to achieve strong performance on verifiable reasoning benchmarks like AIME, HMMT, IMO-AnswerBench, and LiveCodeBench. This model demonstrates that compact models can achieve near-frontier reasoning capabilities in structured task spaces with reliable feedback signals. It excels particularly in competitive programming problems and other tasks where answers can be verified.

Loading preview...

VibeThinker-3B: Focused Reasoning in a Compact Model

VibeThinker-3B, developed by WeiboAI, is a 3-billion parameter model optimized for challenging reasoning tasks, including mathematics, coding, and STEM. It builds upon the Spectrum-to-Signal Principle (SSP) post-training pipeline, first introduced in VibeThinker-1.5B, to enhance performance on verifiable reasoning benchmarks.

Key Capabilities & Performance

  • Exceptional Reasoning: Achieves strong results on AIME, HMMT, IMO-AnswerBench, and LiveCodeBench, reaching performance comparable to much larger frontier models like Qwen3.6 Plus and Gemini 3 Pro.
  • Competitive Coding: Passed 123 out of 128 first-attempt submissions (96.1% acceptance rate) on recent unseen LeetCode weekly and biweekly contests (Python).
  • Parametric Compression-Coverage Hypothesis: Demonstrates that compact models can achieve high-level reasoning capabilities in domains with clear feedback and verification mechanisms, challenging the notion that small models are merely compromises.
  • Advanced Training Pipeline: Employs a multi-stage pipeline including curriculum-based two-stage Supervised Fine-Tuning (SFT), Multi-domain Reasoning Reinforcement Learning (RL) with MaxEnt-Guided Policy Optimization (MGPO), Offline Self-Distillation, and Instruct RL for improved controllability.

Recommended Use Cases

  • Competitive Programming: Excels in LeetCode-style problems and other coding challenges.
  • Mathematical Reasoning: Strong performance on complex math problems.
  • STEM Reasoning: Suited for scientific and technical problem-solving where answers are verifiable.

Note: This model is not recommended for tool-calling, API orchestration, or autonomous coding agents, nor for broad open-domain knowledge tasks where larger general-purpose models may be more suitable.