Name: dipta007/GanitLLM-4B_SFT_GRPO API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: dipta007

GanitLLM-4B_SFT_GRPO: Bengali Mathematical Reasoning

GanitLLM-4B_SFT_GRPO is a 4 billion parameter causal language model built upon the Qwen3-4B architecture, developed by dipta007. It is specialized for mathematical reasoning in Bengali, having undergone a two-stage training process: Supervised Fine-Tuning (SFT) on the GANIT-SFT dataset and subsequent GRPO (Generative Reinforcement Learning with Policy Optimization) on GANIT-RLVR.

Key Capabilities & Performance

This model demonstrates substantial improvements in Bengali mathematical problem-solving:

Enhanced Accuracy: Achieves an 8.4 point increase on the Bn-MGSM benchmark (from 69.2 to 77.6) and a 5.8 point increase on Bn-MSVAMP (from 70.5 to 76.3) compared to its base model.
Bengali Reasoning Focus: Exhibits 88.61% Bengali text in its reasoning outputs, a significant leap from the base model's 14.79%.
Concise Solutions: Generates solutions using 80% fewer tokens (189 words vs. 943 words) while maintaining high accuracy.
Training Methodology: Utilizes specific reward functions during GRPO, including format validation, correctness (Bengali and English answer matching), and a dedicated Bengali reasoning reward to ensure language consistency.

Ideal Use Cases

Bengali Mathematical Education: Developing tools for students or educators requiring step-by-step mathematical problem-solving in Bengali.
Localized AI Applications: Integrating mathematical reasoning capabilities into applications targeting Bengali-speaking users.
Research in Low-Resource Languages: Exploring advanced reasoning techniques for languages with limited existing LLM support.

Overview

GanitLLM-4B_SFT_GRPO: Bengali Mathematical Reasoning

Key Capabilities & Performance

Ideal Use Cases

Full Model Card (README)