dipta007/GanitLLM-4B_CGRPO is a 4 billion parameter causal language model based on Qwen3-4B, specifically fine-tuned for Bengali mathematical reasoning. It utilizes Curriculum-GRPO training without supervised fine-tuning, achieving significant accuracy improvements on Bengali mathematical benchmarks like Bn-MGSM and Bn-MSVAMP. This model excels in raw accuracy for mathematical problems, primarily reasoning in English, and generates more concise solutions with a 32768 token context length.
Loading preview...
Overview
GanitLLM-4B_CGRPO is a 4 billion parameter causal language model developed by dipta007, built upon the Qwen3-4B base model. It is uniquely trained using Curriculum-GRPO (Curriculum-Guided Reinforcement Learning with Policy Optimization) directly on the base model, bypassing traditional supervised fine-tuning (SFT). This approach focuses on enhancing mathematical reasoning capabilities, particularly for Bengali problems.
Key Capabilities and Performance
- Enhanced Bengali Mathematical Reasoning: Achieves substantial accuracy gains on Bengali mathematical benchmarks, with a +13.2 increase on Bn-MGSM (from 69.2 to 82.4) and an +8.0 increase on Bn-MSVAMP (from 70.5 to 78.5) compared to the base Qwen3-4B model.
- Efficient Solution Generation: Generates solutions with approximately 10.5% fewer tokens (844 words vs. 943 words for the base model), indicating more concise reasoning.
- English-Centric Reasoning: While designed for Bengali mathematical problems, this specific variant primarily reasons in English, maintaining a similar Bengali reasoning percentage (14.94%) to the base model.
- Curriculum-GRPO Training: Utilizes a single-stage training pipeline with difficulty-aware sampling on the GANIT-RLVR dataset, incorporating reward functions for format validation, correctness (Bengali and English answer match), and Bengali reasoning percentage.
When to Use This Model
This model is ideal for applications requiring high accuracy in solving Bengali mathematical problems, especially when the reasoning process can be predominantly in English. Developers should consider this model for tasks where raw performance on mathematical benchmarks is critical. For use cases specifically requiring Bengali reasoning, the related GanitLLM-4B_SFT_CGRPO model is recommended.