baidu/ERNIE-4.5-21B-A3B-Base-PT
The baidu/ERNIE-4.5-21B-A3B-Base-PT is a 21 billion total parameter, 3 billion activated parameter Mixture-of-Experts (MoE) text model developed by Baidu, featuring a 32768 token context length. It is a pre-trained base model designed for text completion tasks, leveraging a multimodal heterogeneous MoE pre-training approach that initially focuses on text-related parameters. This model is optimized for efficient inference through techniques like multi-expert parallel collaboration and 4-bit/2-bit lossless quantization, making it suitable for high-performance text generation.
Loading preview...
ERNIE-4.5-21B-A3B-Base-PT Overview
ERNIE-4.5-21B-A3B-Base-PT is a 21 billion total parameter, 3 billion activated parameter Mixture-of-Experts (MoE) base model from Baidu, specifically designed for text completion. It is a PyTorch-compatible variant of the ERNIE 4.5 series, distinguished by its pre-training methodology that initially focuses on text-related parameters before extending to multimodal capabilities in later stages. This model offers a substantial context length of 32768 tokens.
Key Technical Innovations
- Multimodal Heterogeneous MoE Pre-Training: Although this specific model is text-only, its foundation is built on a pre-training strategy that involves both textual and visual modalities, utilizing a heterogeneous MoE structure with modality-isolated routing and specific loss functions to ensure effective representation.
- Scaling-Efficient Infrastructure: Features novel heterogeneous hybrid parallelism, hierarchical load balancing, and memory-efficient pipeline scheduling for high pre-training throughput. For inference, it employs multi-expert parallel collaboration and convolutional code quantization for 4-bit/2-bit lossless quantization.
- Staged Training Strategy: The model's development involved an initial focus on text-related parameters to establish strong language understanding and long-text processing capabilities, with multimodal extensions introduced in later stages.
Use Cases
- Text Completion: Primarily intended for generating text based on given prompts.
- High-Performance Inference: Optimized for efficient deployment and inference across various hardware platforms, particularly beneficial for applications requiring fast text generation.