dipta007/GanitLLM-4B_SFT_GRPO
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Jan 1, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

dipta007/GanitLLM-4B_SFT_GRPO is a 4 billion parameter causal language model, based on Qwen3-4B, developed by dipta007. It is specifically fine-tuned for Bengali mathematical reasoning using Supervised Fine-Tuning (SFT) and standard GRPO. This model significantly improves accuracy on Bengali mathematical benchmarks (Bn-MGSM and Bn-MSVAMP) and generates solutions with 88.61% Bengali reasoning, using 80% fewer tokens than its base model.

Loading preview...

GanitLLM-4B_SFT_GRPO: Bengali Mathematical Reasoning

GanitLLM-4B_SFT_GRPO is a 4 billion parameter causal language model built upon the Qwen3-4B architecture, developed by dipta007. It is specialized for mathematical reasoning in Bengali, having undergone a two-stage training process: Supervised Fine-Tuning (SFT) on the GANIT-SFT dataset and subsequent GRPO (Generative Reinforcement Learning with Policy Optimization) on GANIT-RLVR.

Key Capabilities & Performance

This model demonstrates substantial improvements in Bengali mathematical problem-solving:

  • Enhanced Accuracy: Achieves an 8.4 point increase on the Bn-MGSM benchmark (from 69.2 to 77.6) and a 5.8 point increase on Bn-MSVAMP (from 70.5 to 76.3) compared to its base model.
  • Bengali Reasoning Focus: Exhibits 88.61% Bengali text in its reasoning outputs, a significant leap from the base model's 14.79%.
  • Concise Solutions: Generates solutions using 80% fewer tokens (189 words vs. 943 words) while maintaining high accuracy.
  • Training Methodology: Utilizes specific reward functions during GRPO, including format validation, correctness (Bengali and English answer matching), and a dedicated Bengali reasoning reward to ensure language consistency.

Ideal Use Cases

  • Bengali Mathematical Education: Developing tools for students or educators requiring step-by-step mathematical problem-solving in Bengali.
  • Localized AI Applications: Integrating mathematical reasoning capabilities into applications targeting Bengali-speaking users.
  • Research in Low-Resource Languages: Exploring advanced reasoning techniques for languages with limited existing LLM support.