tfc101728/affine-tbtf12-5G1PWLg8P8PEJtyvBKhqqudHMFbWyohxiB6QjLdX72UyQaty
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Jan 13, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

ERNIE-4.5-21B-A3B-Base is a text-based Mixture-of-Experts (MoE) model developed by Baidu, featuring 21 billion total parameters with 3 billion activated per token. This base model is pre-trained with a multimodal heterogeneous MoE architecture, initially focusing on text to build strong language understanding and long-text processing capabilities. It is specifically designed for text completion tasks and leverages advanced scaling-efficient infrastructure for high-performance inference.

Loading preview...

ERNIE-4.5-21B-A3B-Base: A Text MoE Model

ERNIE-4.5-21B-A3B-Base, developed by Baidu, is a Mixture-of-Experts (MoE) model with 21 billion total parameters and 3 billion activated parameters per token. It is a text-only base model, primarily supporting text completion tasks.

Key Technical Innovations:

  • Multimodal Heterogeneous MoE Pre-Training: Although this specific model is text-only, its foundation was pre-trained using a multimodal approach, incorporating modality-isolated routing and specific loss functions to ensure effective representation of both textual and visual modalities during the initial stages. The model underwent a staged training strategy, with text-related parameters trained first to establish strong language understanding.
  • Scaling-Efficient Infrastructure: Features like heterogeneous hybrid parallelism, hierarchical load balancing, intra-node expert parallelism, and FP8 mixed-precision training contribute to high pre-training throughput. For inference, it utilizes multi-expert parallel collaboration and convolutional code quantization for 4-bit/2-bit lossless quantization.
  • Modality-Specific Post-Training: The model's text-related parameters were extracted after pre-training on trillions of tokens, focusing on general-purpose language understanding and generation.

Model Configuration:

  • Modality: Text
  • Parameters (Total / Activated): 21B / 3B
  • Context Length: 131072 tokens

Usage Notes:

  • This Base model exclusively supports text completion. For evaluation, users should employ the completion API in vLLM/FastDeploy, not chat_completion.
  • The model is available with Transformer-style PyTorch weights (-PT suffix) and requires transformers library version 4.54.0 or newer.