baidu/ERNIE-4.5-21B-A3B-Thinking

TEXT GENERATIONConcurrency Cost:1Model Size:21BQuant:FP8Ctx Length:32kPublished:Sep 8, 2025License:apache-2.0Architecture:Transformer0.8K Open Weights Cold

The baidu/ERNIE-4.5-21B-A3B-Thinking model is a 21 billion total parameter, 3 billion activated parameter text Mixture-of-Experts (MoE) model developed by Baidu. It features significantly improved reasoning capabilities across logical, mathematical, scientific, and coding tasks, along with efficient tool usage and enhanced 128K long-context understanding. This model is specifically optimized for highly complex reasoning tasks requiring deep analytical thought.

Loading preview...

ERNIE-4.5-21B-A3B-Thinking: Enhanced Reasoning MoE Model

ERNIE-4.5-21B-A3B-Thinking is a text-based Mixture-of-Experts (MoE) model developed by Baidu, featuring 21 billion total parameters with 3 billion activated parameters per token. This model represents a significant advancement in the thinking capability of ERNIE's lightweight models, focusing on improving both the quality and depth of reasoning.

Key Capabilities

  • Advanced Reasoning: Demonstrates significantly improved performance across a wide range of complex reasoning tasks, including logical reasoning, mathematics, science, coding, and text generation. It also excels in academic benchmarks that typically demand human expertise.
  • Efficient Tool Usage: Equipped with enhanced capabilities for efficient tool integration and utilization, supporting function calls.
  • Extended Context Understanding: Features enhanced long-context understanding, supporting a context length of 131,072 tokens.
  • MoE Architecture: Utilizes a MoE architecture with 64 total text experts and 6 activated experts per token, including 2 shared experts.

Good For

  • Highly Complex Reasoning Tasks: This model is strongly recommended for applications requiring deep analytical thought and intricate problem-solving.
  • Tool-Augmented Applications: Ideal for scenarios where efficient integration and use of external tools are crucial.
  • Long-Context Processing: Suitable for tasks that benefit from processing and understanding very long input sequences, up to 128K tokens.