tfc101728/affine-tbtf12-5G1PWLg8P8PEJtyvBKhqqudHMFbWyohxiB6QjLdX72UyQaty
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Jan 13, 2026License:apache-2.0Architecture:Transformer Open Weights Cold
ERNIE-4.5-21B-A3B-Base is a text-based Mixture-of-Experts (MoE) model developed by Baidu, featuring 21 billion total parameters with 3 billion activated per token. This base model is pre-trained with a multimodal heterogeneous MoE architecture, initially focusing on text to build strong language understanding and long-text processing capabilities. It is specifically designed for text completion tasks and leverages advanced scaling-efficient infrastructure for high-performance inference.
Loading preview...
ERNIE-4.5-21B-A3B-Base: A Text MoE Model
ERNIE-4.5-21B-A3B-Base, developed by Baidu, is a Mixture-of-Experts (MoE) model with 21 billion total parameters and 3 billion activated parameters per token. It is a text-only base model, primarily supporting text completion tasks.
Key Technical Innovations:
- Multimodal Heterogeneous MoE Pre-Training: Although this specific model is text-only, its foundation was pre-trained using a multimodal approach, incorporating modality-isolated routing and specific loss functions to ensure effective representation of both textual and visual modalities during the initial stages. The model underwent a staged training strategy, with text-related parameters trained first to establish strong language understanding.
- Scaling-Efficient Infrastructure: Features like heterogeneous hybrid parallelism, hierarchical load balancing, intra-node expert parallelism, and FP8 mixed-precision training contribute to high pre-training throughput. For inference, it utilizes multi-expert parallel collaboration and convolutional code quantization for 4-bit/2-bit lossless quantization.
- Modality-Specific Post-Training: The model's text-related parameters were extracted after pre-training on trillions of tokens, focusing on general-purpose language understanding and generation.
Model Configuration:
- Modality: Text
- Parameters (Total / Activated): 21B / 3B
- Context Length: 131072 tokens
Usage Notes:
- This Base model exclusively supports text completion. For evaluation, users should employ the
completionAPI in vLLM/FastDeploy, notchat_completion. - The model is available with Transformer-style PyTorch weights (
-PTsuffix) and requirestransformerslibrary version 4.54.0 or newer.