mesolitica/Malaysian-Qwen2.5-7B-Dialect-Reasoning-GRPO

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:May 27, 2025Architecture:Transformer0.0K Cold

mesolitica/Malaysian-Qwen2.5-7B-Dialect-Reasoning-GRPO is a 7.6 billion parameter Qwen 2.5 model developed by mesolitica, fine-tuned for reasoning in Malay dialects. It utilizes online Reinforcement Learning with GRPO on a curated Malay Dialect Reasoning dataset. This model is specifically optimized to improve reasoning capabilities across various Malaysian dialects and translate between them and standard Malay.

Loading preview...

Overview

mesolitica/Malaysian-Qwen2.5-7B-Dialect-Reasoning-GRPO is a 7.6 billion parameter Qwen 2.5 model developed by mesolitica. It is fine-tuned using online Reinforcement Learning with GRPO (Generalized Reinforcement Policy Optimization) on a highly curated Malay Dialect Reasoning dataset. The model's training involved replicating each datapoint to 6 generations to enhance reasoning across dialects.

Key Capabilities

  • Dialect Reasoning: Significantly improves reasoning capabilities within and across various Malay dialects.
  • Dialect Translation: Demonstrates proficiency in translating between specific Malay dialects (e.g., Johor, Kedah, Kelantan) and standard Malay.
  • Reinforcement Learning: Leverages online GRPO with full parameter updates for enhanced performance.

Performance

The model was evaluated using vLLM with sacrebleu CHRF max@5 scores. It achieved an average score of 56.82% for dialect-to-standard Malay translation and 58.11% for standard Malay-to-dialect translation in Float32 precision. Similar performance was observed in Float16 precision, with average scores of 57.27% and 57.44% respectively.

Recommended Usage

For optimal reasoning performance, users are advised to employ a specific system prompt: You are going to enter reasoning mode. First, you try to think step-by-step in Malay. After that, put your final answer within $\boxed{}$. This prompt guides the model to perform step-by-step reasoning in Malay before providing a final, boxed answer.