dipta007/GanitLLM-1.7B_SFT_GRPO
TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Jan 1, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

GanitLLM-1.7B_SFT_GRPO by dipta007 is a 1.7 billion parameter causal language model based on Qwen3-1.7B, specifically fine-tuned for Bengali mathematical reasoning. It utilizes Supervised Fine-Tuning (SFT) and standard GRPO (Generative Reinforcement Learning with Policy Optimization) to significantly improve accuracy on Bengali mathematical benchmarks like Bn-MGSM and Bn-MSVAMP. This model excels at generating concise, Bengali-centric reasoning steps for mathematical problems, offering a specialized solution for Bengali NLP tasks.

Loading preview...

GanitLLM-1.7B_SFT_GRPO Overview

GanitLLM-1.7B_SFT_GRPO is a compact 1.7 billion parameter causal language model developed by dipta007, built upon the Qwen3-1.7B base architecture. It is specifically optimized for Bengali mathematical reasoning through a two-stage training process involving Supervised Fine-Tuning (SFT) and Generative Reinforcement Learning with Policy Optimization (GRPO).

Key Capabilities & Performance

This model demonstrates substantial improvements over its base model, particularly in Bengali mathematical tasks:

  • Enhanced Accuracy: Achieves +38.4 accuracy on the Bn-MGSM benchmark (from 15.2 to 53.6) and +52.8 accuracy on the Bn-MSVAMP benchmark (from 14.1 to 66.9).
  • Bengali Reasoning Focus: Exhibits 88.32% Bengali reasoning in its solutions, a significant increase from the base model's 19.64%.
  • Concise Solutions: Generates solutions with 81.6% fewer tokens, averaging 207 words compared to 1124 words from the base model.

Training Methodology

The model was trained using:

  1. Supervised Fine-Tuning (SFT): Grounded on the GANIT-SFT dataset, comprising approximately 11,000 examples.
  2. GRPO: Applied standard reinforcement learning with random sampling on the GANIT-RLVR dataset, which includes around 7,300 examples. Reward functions were designed to validate output format, ensure correctness (Bengali and English answer matches), and promote Bengali text in reasoning steps.

Use Cases

GanitLLM-1.7B_SFT_GRPO is ideal for applications requiring accurate and efficient mathematical problem-solving in Bengali, especially where concise and culturally relevant reasoning is crucial. Its compact size and specialized training make it suitable for deployment in resource-constrained environments or for focused Bengali NLP tasks.