Name: dipta007/GanitLLM-4B_SFT_CGRPO API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: dipta007

GanitLLM-4B_SFT_CGRPO: Bengali Mathematical Reasoning Model

GanitLLM-4B_SFT_CGRPO is a 4 billion parameter causal language model built upon the Qwen/Qwen3-4B architecture, specifically optimized for mathematical reasoning in Bengali. Developed by dipta007, this model leverages a unique multi-stage training pipeline involving Supervised Fine-Tuning (SFT) and a novel Curriculum-GRPO (Curriculum-Guided Reinforcement Learning with Policy Optimization) approach.

Key Capabilities & Performance

Enhanced Bengali Mathematical Reasoning: Achieves significant improvements on Bengali mathematical benchmarks, with a +7.6 accuracy on Bn-MGSM (76.8%) and +5.9 accuracy on Bn-MSVAMP (76.4%) compared to the base Qwen3-4B model.
Bengali-Centric Reasoning: Demonstrates 88.71% Bengali reasoning in its solutions, a substantial increase from the base model's 14.79%.
Concise Solutions: Generates solutions with 79.5% fewer tokens (193 words vs. 943 words) while maintaining high accuracy.
Context Length: Supports a context length of 4,096 tokens.

Training Methodology

The model was trained using:

Supervised Fine-Tuning (SFT): Initial training on the GANIT-SFT dataset (~11k examples) to establish foundational reasoning in Bengali.
Curriculum-GRPO: Subsequent reinforcement learning on the GANIT-RLVR dataset (~7.3k examples), incorporating difficulty-aware sampling and specific reward functions for format, correctness (Bengali and English answers), and ensuring a high percentage of Bengali text in the reasoning steps.

Use Cases

This model is ideal for applications requiring accurate and efficient mathematical problem-solving in Bengali, particularly where concise and culturally relevant explanations are valued. It can be integrated into educational tools, intelligent tutoring systems, or any platform needing robust Bengali mathematical reasoning capabilities.

Overview

GanitLLM-4B_SFT_CGRPO: Bengali Mathematical Reasoning Model

Key Capabilities & Performance

Training Methodology

Use Cases

Full Model Card (README)