dipta007/GanitLLM-1.7B_SFT_CGRPO is a 1.7 billion parameter causal language model based on Qwen3-1.7B, developed by dipta007. It is specifically fine-tuned for Bengali mathematical reasoning using a novel Curriculum-GRPO approach. This model significantly improves accuracy on Bengali mathematical benchmarks (Bn-MGSM and Bn-MSVAMP) and generates more concise, Bengali-centric solutions compared to its base model. It is optimized for solving mathematical problems in Bengali with high accuracy and efficient reasoning.
Loading preview...
GanitLLM-1.7B_SFT_CGRPO: Bengali Mathematical Reasoning
GanitLLM-1.7B_SFT_CGRPO is a 1.7 billion parameter causal language model, built upon the Qwen3-1.7B base, and developed by dipta007. Its core innovation lies in its training methodology, which employs a multi-stage pipeline including Supervised Fine-Tuning (SFT) and a novel Curriculum-GRPO (Generative Reinforcement Learning with Policy Optimization) approach. This model is specifically designed to excel in Bengali mathematical reasoning tasks.
Key Capabilities and Performance
- Enhanced Bengali Mathematical Reasoning: Achieves a remarkable +37.6 accuracy on the Bn-MGSM benchmark (from 15.2 to 52.8) and +52.7 accuracy on the Bn-MSVAMP benchmark (from 14.1 to 66.8).
- High Bengali Reasoning Percentage: Demonstrates 87.80% Bengali reasoning in its solutions, a significant improvement over the base model's 19.64%.
- Concise Solutions: Generates solutions with 81.3% fewer tokens (averaging 210 words compared to 1124 words for the base model), making its outputs more efficient.
- Context Length: Supports a context length of 4,096 tokens.
Training Methodology
The model's superior performance stems from its unique training:
- Supervised Fine-Tuning (SFT): Initial training on the GANIT-SFT dataset (~11k examples) to establish foundational Bengali reasoning.
- Curriculum-GRPO: Reinforcement learning with difficulty-aware sampling on the GANIT-RLVR dataset (~7.3k examples), utilizing specific reward functions for format, correctness (Bengali and English answer match), and ensuring a high percentage of Bengali text in reasoning.
Use Cases
This model is ideal for applications requiring accurate and efficient mathematical problem-solving in Bengali, particularly where concise and culturally relevant reasoning is crucial. Its specialized training makes it a strong candidate for educational tools, automated problem solvers, and research in multilingual NLP focusing on mathematical domains.