Name: mesolitica/Malaysian-Qwen2.5-7B-Dialect-Reasoning-GRPO API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: mesolitica

Overview

mesolitica/Malaysian-Qwen2.5-7B-Dialect-Reasoning-GRPO is a 7.6 billion parameter Qwen 2.5 model developed by mesolitica. It is fine-tuned using online Reinforcement Learning with GRPO (Generalized Reinforcement Policy Optimization) on a highly curated Malay Dialect Reasoning dataset. The model's training involved replicating each datapoint to 6 generations to enhance reasoning across dialects.

Key Capabilities

Dialect Reasoning: Significantly improves reasoning capabilities within and across various Malay dialects.
Dialect Translation: Demonstrates proficiency in translating between specific Malay dialects (e.g., Johor, Kedah, Kelantan) and standard Malay.
Reinforcement Learning: Leverages online GRPO with full parameter updates for enhanced performance.

Performance

The model was evaluated using vLLM with sacrebleu CHRF max@5 scores. It achieved an average score of 56.82% for dialect-to-standard Malay translation and 58.11% for standard Malay-to-dialect translation in Float32 precision. Similar performance was observed in Float16 precision, with average scores of 57.27% and 57.44% respectively.

Recommended Usage

For optimal reasoning performance, users are advised to employ a specific system prompt: You are going to enter reasoning mode. First, you try to think step-by-step in Malay. After that, put your final answer within $\boxed{}$. This prompt guides the model to perform step-by-step reasoning in Malay before providing a final, boxed answer.

Overview

Overview

Key Capabilities

Performance

Recommended Usage

Full Model Card (README)