GanitLLM-4B_SFT_CGRPO: Bengali Mathematical Reasoning Model
GanitLLM-4B_SFT_CGRPO is a 4 billion parameter causal language model developed by dipta007, built upon the Qwen3-4B architecture. This model is specifically designed and optimized for mathematical reasoning tasks in Bengali, showcasing significant advancements over its base model through a unique training methodology.
Key Capabilities & Differentiators
- Enhanced Bengali Mathematical Reasoning: Achieves 88.71% Bengali reasoning, a substantial improvement from the base model's 14.79%.
- Superior Benchmark Performance: Demonstrates a +7.6 accuracy increase on the Bn-MGSM benchmark (from 69.2 to 76.8) and a +5.9 accuracy increase on the Bn-MSVAMP benchmark (from 70.5 to 76.4).
- Concise Solution Generation: Generates solutions with 79.5% fewer tokens (193 words vs. 943 words for the base model), leading to more efficient and direct answers.
- Advanced Training Methodology: Utilizes a multi-stage pipeline involving Supervised Fine-Tuning (SFT) on ~11k examples and a novel Curriculum-GRPO (Reinforcement Learning with difficulty-aware sampling) on ~7.3k examples. This includes specialized reward functions for format, correctness (Bengali and English matches), and ensuring high Bengali text percentage in reasoning.
- Multilingual Support: Capable of processing both Bengali and English, with a primary focus on Bengali mathematical problem-solving.
Ideal Use Cases
- Bengali Mathematical Problem Solving: Excels in applications requiring accurate and concise mathematical reasoning in Bengali.
- Educational Tools: Suitable for developing AI tutors or learning platforms focused on mathematics for Bengali speakers.
- Research in Low-Resource Languages: Offers a strong baseline for further research into mathematical reasoning in languages like Bengali, where dedicated models are less common.