Overview
mesolitica/Malaysian-Llama-3.1-8B-Instruct is an 8 billion parameter instruction-tuned model, building upon Meta's Llama-3.1-8B-Instruct. Developed by Mesolitica, this model has undergone further fine-tuning using a highly curated 1.5 billion token Malaysian instruction dataset to enhance its local linguistic and contextual understanding.
Key Capabilities
- Multilingual and Dialectal Support: Responds and codes in a wide array of Malaysian languages and dialects, including Mandarin, Tamil, Jawi, Manglish, Johor, Kedah, Kelantan, Pahang, Perak, Sabah, Sarawak, Selangor, Negeri Sembilan, and Terengganu.
- Malaysian Context Understanding: Demonstrates improved comprehension of multi-turn conversations related to Malaysian legislation, politics, religions, and local languages.
- Training Methodology: Fine-tuned using LoRA on specific attention and projection layers, with a rank of 128 and alpha of 256 (or 2.0). It utilizes 8192 context length with proper SDPA causal masking and chunk CCE loss for LoRA.
Benchmarks
On the MalayMMLU benchmark (0-shot, first token accuracy), the model achieved an average accuracy of 61.28%, with specific category scores ranging from 60.25% (Others) to 62.43% (Humanities). While slightly lower than the base Llama-3.1-8B-Instruct's 64.26% average on MalayMMLU, this fine-tuned version prioritizes deep Malaysian linguistic and contextual relevance over general MalayMMLU performance.
Good For
- Applications requiring nuanced understanding and generation in various Malaysian languages and dialects.
- Use cases involving Malaysian-specific cultural, political, or legal contexts.
- Developers building AI solutions tailored for the Malaysian market.