xiaomi-research/MiLMMT-46-4B-Pretrain
VISIONConcurrency Cost:1Model Size:4.3BQuant:BF16Ctx Length:32kPublished:Jan 29, 2026License:gemmaArchitecture:Transformer0.0K Cold

MiLMMT-46-4B-Pretrain is a 4.3 billion parameter language model developed by Xiaomi Inc. through continual pretraining of Gemma3-4B. It was trained on 143 billion tokens of mixed monolingual and parallel data across 46 languages, featuring a 32768-token context length. This model is designed for multilingual language understanding and generation, supporting a broad range of languages including Chinese, English, Japanese, Korean, and many others, but it is not intended as a direct machine translation model.

Loading preview...