The mesolitica/Malaysian-Llama-3.2-1B-Instruct is a 1 billion parameter instruction-tuned causal language model, fine-tuned by mesolitica from the Llama-3.2-1B-Instruct base model. It features a 32768 token context length and is specifically optimized for understanding and generating content in various Malaysian languages and dialects, including Mandarin, Tamil, Jawi, and Manglish. This model excels in handling multi-turn Malaysian contexts related to legislation, politics, religions, and local languages, and can also perform coding tasks in these languages.
Loading preview...
Malaysian-Llama-3.2-1B-Instruct Overview
This model is a 1 billion parameter instruction-tuned language model developed by mesolitica, building upon the Llama-3.2-1B-Instruct architecture. It has been extensively fine-tuned on a highly curated 1.5 billion token Malaysian instruction dataset, specifically mesolitica/Malaysian-SFT, to enhance its understanding and generation capabilities for Malaysian-specific contexts. The training involved LoRA with a rank of 128 and alpha of 256, utilizing multipacking for an 8192 context length with SDPA causal masking.
Key Capabilities
- Multilingual Support: Supports responses and coding in a wide array of Malaysian languages and dialects, including Mandarin, Tamil, Jawi, Manglish, Johor, Kedah, Kelantan, Pahang, Perak, Sabah, Sarawak, Selangor, Negeri Sembilan, and Terengganu.
- Malaysian Context Understanding: Excels in multi-turn conversations and tasks related to Malaysian legislation, politics, religions, and local languages.
- Improved Performance: Demonstrates notable improvements over the base Llama-3.2-1B-Instruct model on the MalayMMLU benchmark, with an average accuracy of 41.28% compared to the original's 37.86% for probability next tokens, and 41.76% versus 36.85% for first token match using vLLM.
Good For
- Applications requiring deep understanding and generation in diverse Malaysian linguistic and cultural contexts.
- Developing chatbots or virtual assistants tailored for Malaysian users.
- Tasks involving code generation or translation within a Malaysian language framework.