mesolitica/Malaysian-Qwen2.5-7B-Instruct

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Mar 3, 2025Architecture:Transformer0.0K Cold

The mesolitica/Malaysian-Qwen2.5-7B-Instruct is a 7.6 billion parameter instruction-tuned causal language model, fine-tuned by mesolitica from the Qwen2.5-7B-Instruct base model. It specializes in understanding and generating content in various Malaysian languages and dialects, including Mandarin, Tamil, Jawi, and multiple regional variations. This model excels at handling multi-turn Malaysian contexts related to legislation, politics, religions, and local languages, and can also generate code in these languages.

Loading preview...

Malaysian-Qwen2.5-7B-Instruct Overview

This model is a 7.6 billion parameter instruction-tuned language model developed by mesolitica, building upon the Qwen2.5-7B-Instruct architecture. It has been extensively fine-tuned on a 1.5 billion token Malaysian instruction dataset to enhance its understanding and generation capabilities for Malaysian-specific contexts.

Key Capabilities

  • Multilingual and Dialectal Support: The model supports responses and code generation in a wide array of Malaysian languages and dialects, including Mandarin, Tamil, Jawi, Manglish, Johor, Kedah, Kelantan, Pahang, Perak, Sabah, Sarawak, Selangor, Negeri Sembilan, and Terengganu.
  • Malaysian Context Understanding: It is specifically trained to handle multi-turn conversations and queries related to Malaysian legislation, politics, religions, and local languages.
  • Code Generation: Capable of generating code in the aforementioned Malaysian languages and dialects.

Training Details

The model was fine-tuned using LoRA on the mesolitica/Malaysian-SFT dataset. The training involved multipacking an 8192 context length with SDPA causal masking to prevent document contamination and ensure proper position IDs, alongside Chunk CCE loss for LoRA.

Performance

On the MalayMMLU benchmark (0-shot, first token accuracy), the Malaysian-Qwen2.5-7B-Instruct (revision 83a0e145c726385502898ab7e016982eae1b684d) achieved an average accuracy of 69.26%, outperforming the original Qwen2.5-7B-Instruct's 66.52%.