Overview

MiLMMT-46-4B-Pretrain is a 4.3 billion parameter language model developed by Xiaomi Inc. It is built upon the Gemma3-4B architecture and has undergone extensive continual pretraining. The model's training involved 143 billion tokens, comprising both monolingual and parallel data, specifically curated for 46 different languages. This broad linguistic coverage aims to enhance multilingual capabilities, making it suitable for diverse global applications.

Key Capabilities

Extensive Multilingual Support: Supports 46 languages, including major global languages like English, Chinese (Simplified/Traditional), Japanese, Korean, Spanish, French, German, and many more.
Large Context Window: Features a 32768-token context length, enabling processing of longer inputs and maintaining coherence over extended text.
Continual Pretraining: Leverages continual pretraining on Gemma3-4B, integrating a vast and diverse dataset to improve language understanding across multiple scripts and linguistic structures.

Good for

Multilingual Language Understanding: Ideal for tasks requiring comprehension and generation across a wide array of languages.
Research in Multilingual LLMs: Provides a strong base for researchers exploring multilingual model architectures and data scaling strategies.
Applications requiring broad language coverage: Suitable for scenarios where a single model needs to handle inputs and outputs in numerous languages, such as global content analysis or cross-lingual information retrieval.

Note: It is important to understand that MiLMMT-46-4B-Pretrain is a language model and is explicitly stated NOT to be a direct machine translation model.

Overview

Overview

Key Capabilities

Good for

Full Model Card (README)