baidu/ERNIE-4.5-21B-A3B-PT
ERNIE-4.5-21B-A3B-PT is a 21 billion total parameter, 3 billion activated parameter text-based Mixture-of-Experts (MoE) model developed by Baidu. This post-trained model features a 32768 token context length and utilizes a heterogeneous MoE structure with modality-isolated routing for efficient multimodal pre-training, though this specific variant is text-only. It is optimized for general-purpose language understanding and generation, leveraging advanced training infrastructure and post-training methods like SFT, DPO, or UPO.
Loading preview...
ERNIE-4.5-21B-A3B-PT Overview
ERNIE-4.5-21B-A3B-PT is a text-focused Mixture-of-Experts (MoE) model from Baidu, featuring 21 billion total parameters with 3 billion activated parameters per token. It is built upon the ERNIE 4.5 architecture, which incorporates advanced techniques like Multimodal Heterogeneous MoE Pre-Training (though this specific model is text-only) and Scaling-Efficient Infrastructure for high-performance training and inference. The model benefits from a novel heterogeneous hybrid parallelism and hierarchical load balancing strategy, alongside FP8 mixed-precision training and fine-grained recomputation methods.
Key Capabilities
- Efficient MoE Architecture: Utilizes a 21B total parameter MoE design with 3B activated parameters, ensuring efficient processing.
- Optimized for Text: Specifically post-trained for general-purpose language understanding and generation tasks.
- Advanced Training: Leverages sophisticated training infrastructure for high throughput and efficient resource utilization.
- High Context Length: Supports a context length of 32768 tokens, enabling processing of longer inputs.
Good For
- General Language Tasks: Excels in a wide range of text-based applications requiring robust language understanding and generation.
- Efficient Deployment: Designed with inference optimizations like multi-expert parallel collaboration and convolutional code quantization for 4-bit/2-bit lossless quantization, making it suitable for various hardware platforms.