AIDC-AI/Marco-LLM-GLO

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Feb 27, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

AIDC-AI/Marco-LLM-GLO is a 7.6 billion parameter multilingual language model based on the Transformer architecture, developed by AIDC-AI. It features extensive continual pretraining on over 5 trillion tokens, specifically designed to bridge the performance gap between high-resource and low-resource languages. The model excels in multilingual tasks such as machine translation, question answering, and reasoning across 29 languages, utilizing an enhanced tokenizer for improved efficiency.

Loading preview...

Marco-LLM-GLO: Bridging Multilingual Performance Gaps

Marco-LLM-GLO is a 7.6 billion parameter multilingual language model developed by AIDC-AI, built upon the Transformer architecture. Its core innovation lies in extensive continual pretraining on a massive dataset exceeding 5 trillion tokens, with a strategic focus on enhancing performance in low-resource languages while maintaining strong capabilities in high-resource languages like English and Chinese.

Key Capabilities & Features

  • Multilingual Training: Trained on a diverse dataset covering 29 languages, including both high-resource (e.g., English, Chinese) and low-resource languages (e.g., Kazakh, Nepali).
  • Enhanced Tokenizer: Incorporates an improved tokenizer specifically designed for better handling and higher accuracy with multilingual data.
  • Performance: Demonstrates significant improvements in multilingual tasks such as machine translation, question answering, and cross-lingual reasoning compared to other open-source models.
  • Post-Training Support: Designed to support various post-training methods like Supervised Fine-tuning (SFT) and Direct Preference Optimization (DPO) for task-specific and language-specific enhancements.

When to Use Marco-LLM-GLO

This base model is primarily intended for further adaptation through post-training methods such as SFT, RLHF, or continued pretraining. It is particularly well-suited for applications requiring robust multilingual understanding and generation, especially in scenarios involving a mix of high and low-resource languages. Developers should fine-tune Marco-LLM-GLO for specific downstream tasks rather than using the base model directly for text generation.