mesolitica/Malaysian-Qwen2.5-7B-Dialect-Reasoning-GRPO
TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:May 27, 2025Architecture:Transformer0.0K Cold

mesolitica/Malaysian-Qwen2.5-7B-Dialect-Reasoning-GRPO is a 7.6 billion parameter Qwen 2.5 model developed by mesolitica, fine-tuned for reasoning in Malay dialects. It utilizes online Reinforcement Learning with GRPO on a curated Malay Dialect Reasoning dataset. This model is specifically optimized to improve reasoning capabilities across various Malaysian dialects and translate between them and standard Malay.

Loading preview...