GanitLLM-1.7B_CGRPO by dipta007 is a 1.7 billion parameter causal language model based on Qwen3-1.7B, fine-tuned using Curriculum-GRPO for Bengali mathematical reasoning. It achieves significant accuracy improvements on Bengali mathematical benchmarks (Bn-MGSM and Bn-MSVAMP) by reasoning primarily in English, while generating more concise solutions. This model is optimized for high-accuracy mathematical problem-solving in a Bengali context, despite its English reasoning output.
Loading preview...
GanitLLM-1.7B_CGRPO: Bengali Mathematical Reasoning
GanitLLM-1.7B_CGRPO is a 1.7 billion parameter causal language model developed by dipta007, built upon the Qwen3-1.7B base model. It is specifically trained using a Curriculum-GRPO (Curriculum-Guided Reinforcement Learning from Policy Optimization) approach, directly applied to the base model without an initial Supervised Fine-Tuning (SFT) stage.
Key Capabilities & Performance
This model excels in Bengali mathematical reasoning tasks, demonstrating substantial performance gains:
- +44.4 accuracy on the Bn-MGSM benchmark, increasing from 15.2 to 59.6.
- +52.1 accuracy on the Bn-MSVAMP benchmark, improving from 14.1 to 66.2.
- Generates solutions with 10.9% fewer tokens (1002 words average vs. 1124 for the base model), indicating more concise reasoning.
Important Note on Reasoning Language
While designed for Bengali mathematical problems, this specific variant of GanitLLM-1.7B_CGRPO primarily performs its reasoning steps in English. Its Bengali reasoning percentage is similar to the base model (18.74%). For models that reason in Bengali, users are directed to the GanitLLM-1.7B_SFT_CGRPO variant.
Training Methodology
The model was trained using a single-stage Curriculum-GRPO pipeline on the GANIT-RLVR dataset (~7.3k examples). Reward functions during training included:
- Format Reward: Validating the structure of
<think>and<answer>tags. - Correctness Reward: Awarding +2.0 for Bengali answer matches and +1.0 for English matches.
- Bengali Reasoning Reward: Encouraging >80% Bengali text in reasoning (though this model's output is predominantly English).
Use Cases
This model is ideal for applications requiring high-accuracy mathematical problem-solving in a Bengali context, where the intermediate reasoning steps can be in English. It offers a compact and efficient solution for improving mathematical benchmark scores.