Name: mesolitica/Malaysian-Qwen2.5-72B-Instruct API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: mesolitica

Malaysian-Qwen2.5-72B-Instruct Overview

This model is a 72.7 billion parameter instruction-tuned language model developed by mesolitica, building upon the Qwen2.5-72B-Instruct architecture. It has been extensively fine-tuned on a highly curated 1.5 billion token Malaysian instruction dataset to specialize in Malaysian linguistic and cultural contexts.

Key Capabilities and Improvements

Multilingual Malaysian Support: Enhanced ability to respond and code in a wide array of Malaysian languages and dialects, including Mandarin, Tamil, Jawi, Manglish, Johor, Kedah, Kelantan, Pahang, Perak, Sabah, Sarawak, Selangor, Negeri Sembilan, and Terengganu.
Contextual Understanding: Improved comprehension and generation for multi-turn conversations related to specific Malaysian topics such as legislation, politics, religions, and local languages.
Performance Benchmarks: Demonstrates improved accuracy on the MalayMMLU benchmark, with an average accuracy of 79.63% for probability next tokens and 77.29% for first token match, surpassing the original Qwen2.5-72B-Instruct model in these categories.

Training Details

The model was fine-tuned using LoRA on specific layers (q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj, embed_tokens, lm_head) with a rank of 128 and alpha of 256 or 2.0. It utilized multipacking with an 8192 context length and proper SDPA causal masking to prevent document contamination, alongside Chunk CCE loss for LoRA. The training dataset used was mesolitica/Malaysian-SFT.

Overview

Malaysian-Qwen2.5-72B-Instruct Overview

Key Capabilities and Improvements

Training Details

Full Model Card (README)