dipta007/GanitLLM-4B_SFT_GRPO is a 4 billion parameter causal language model, based on Qwen3-4B, developed by dipta007. It is specifically fine-tuned for Bengali mathematical reasoning using Supervised Fine-Tuning (SFT) and standard GRPO. This model significantly improves accuracy on Bengali mathematical benchmarks (Bn-MGSM and Bn-MSVAMP) and generates solutions with 88.61% Bengali reasoning, using 80% fewer tokens than its base model.
Loading preview...
GanitLLM-4B_SFT_GRPO: Bengali Mathematical Reasoning
GanitLLM-4B_SFT_GRPO is a 4 billion parameter causal language model built upon the Qwen3-4B architecture, developed by dipta007. It is specialized for mathematical reasoning in Bengali, having undergone a two-stage training process: Supervised Fine-Tuning (SFT) on the GANIT-SFT dataset and subsequent GRPO (Generative Reinforcement Learning with Policy Optimization) on GANIT-RLVR.
Key Capabilities & Performance
This model demonstrates substantial improvements in Bengali mathematical problem-solving:
- Enhanced Accuracy: Achieves an 8.4 point increase on the Bn-MGSM benchmark (from 69.2 to 77.6) and a 5.8 point increase on Bn-MSVAMP (from 70.5 to 76.3) compared to its base model.
- Bengali Reasoning Focus: Exhibits 88.61% Bengali text in its reasoning outputs, a significant leap from the base model's 14.79%.
- Concise Solutions: Generates solutions using 80% fewer tokens (189 words vs. 943 words) while maintaining high accuracy.
- Training Methodology: Utilizes specific reward functions during GRPO, including format validation, correctness (Bengali and English answer matching), and a dedicated Bengali reasoning reward to ensure language consistency.
Ideal Use Cases
- Bengali Mathematical Education: Developing tools for students or educators requiring step-by-step mathematical problem-solving in Bengali.
- Localized AI Applications: Integrating mathematical reasoning capabilities into applications targeting Bengali-speaking users.
- Research in Low-Resource Languages: Exploring advanced reasoning techniques for languages with limited existing LLM support.