MERaLiON/LLaMA-3-MERaLiON-8B-Instruct Overview
MERaLiON/LLaMA-3-MERaLiON-8B-Instruct is an 8 billion parameter large language model developed by I²R, A*STAR. It extends the Llama-3-8B architecture, undergoing continued pretraining on over 120 billion tokens, primarily in English, Chinese, and Indonesian. A key differentiator is its SEA Multilingual Corpus Mixing strategy, which significantly enhances its understanding and generation capabilities in Chinese and Indonesian by diversifying training content and using replay strategies to prevent catastrophic forgetting.
Key Capabilities and Features
- Extended Multilingual Pretraining: Continued pretraining on 120B+ tokens, with a strong focus on English, Chinese, and Indonesian.
- Enhanced Multilingual Performance: Demonstrates improved results on benchmarks such as Cross-MMLU, Cross-LogiQA, IndoMMLU, and CNEval, surpassing official Llama-3 models in these areas.
- Instruction Tuning via Model Merging: Achieves strong instruction-following by merging Llama-3.1-8B-base and Llama-3.1-8B-instruct weights, rather than traditional instruction-tuning.
- 8192-token Context Length: Supports longer conversational and document-based interactions.
When to Use This Model
This model is particularly well-suited for applications requiring robust multilingual understanding and generation, especially in contexts involving English, Chinese, and Indonesian. Its enhanced performance on reasoning and question-answering benchmarks in these languages makes it a strong candidate for:
- Multilingual chatbots and virtual assistants.
- Content generation and summarization in specified languages.
- Cross-lingual information retrieval and analysis.
Caveats: The model has not undergone explicit safety alignment. Users should implement their own safeguards and critically evaluate outputs, especially in high-stakes applications.