dipta007/GanitLLM-1.7B_SFT_CGRPO
TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Jan 1, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

dipta007/GanitLLM-1.7B_SFT_CGRPO is a 1.7 billion parameter causal language model based on Qwen3-1.7B, developed by dipta007. It is specifically fine-tuned for Bengali mathematical reasoning using a novel Curriculum-GRPO approach. This model significantly improves accuracy on Bengali mathematical benchmarks (Bn-MGSM and Bn-MSVAMP) and generates more concise, Bengali-centric solutions compared to its base model. It is optimized for solving mathematical problems in Bengali with high accuracy and efficient reasoning.

Loading preview...

GanitLLM-1.7B_SFT_CGRPO: Bengali Mathematical Reasoning

GanitLLM-1.7B_SFT_CGRPO is a 1.7 billion parameter causal language model, built upon the Qwen3-1.7B base, and developed by dipta007. Its core innovation lies in its training methodology, which employs a multi-stage pipeline including Supervised Fine-Tuning (SFT) and a novel Curriculum-GRPO (Generative Reinforcement Learning with Policy Optimization) approach. This model is specifically designed to excel in Bengali mathematical reasoning tasks.

Key Capabilities and Performance

  • Enhanced Bengali Mathematical Reasoning: Achieves a remarkable +37.6 accuracy on the Bn-MGSM benchmark (from 15.2 to 52.8) and +52.7 accuracy on the Bn-MSVAMP benchmark (from 14.1 to 66.8).
  • High Bengali Reasoning Percentage: Demonstrates 87.80% Bengali reasoning in its solutions, a significant improvement over the base model's 19.64%.
  • Concise Solutions: Generates solutions with 81.3% fewer tokens (averaging 210 words compared to 1124 words for the base model), making its outputs more efficient.
  • Context Length: Supports a context length of 4,096 tokens.

Training Methodology

The model's superior performance stems from its unique training:

  1. Supervised Fine-Tuning (SFT): Initial training on the GANIT-SFT dataset (~11k examples) to establish foundational Bengali reasoning.
  2. Curriculum-GRPO: Reinforcement learning with difficulty-aware sampling on the GANIT-RLVR dataset (~7.3k examples), utilizing specific reward functions for format, correctness (Bengali and English answer match), and ensuring a high percentage of Bengali text in reasoning.

Use Cases

This model is ideal for applications requiring accurate and efficient mathematical problem-solving in Bengali, particularly where concise and culturally relevant reasoning is crucial. Its specialized training makes it a strong candidate for educational tools, automated problem solvers, and research in multilingual NLP focusing on mathematical domains.