Name: dipta007/GanitLLM-1.7B_SFT_CGRPO API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: dipta007

GanitLLM-1.7B_SFT_CGRPO: Bengali Mathematical Reasoning

GanitLLM-1.7B_SFT_CGRPO is a 1.7 billion parameter causal language model, built upon the Qwen3-1.7B base, and developed by dipta007. Its core innovation lies in its training methodology, which employs a multi-stage pipeline including Supervised Fine-Tuning (SFT) and a novel Curriculum-GRPO (Generative Reinforcement Learning with Policy Optimization) approach. This model is specifically designed to excel in Bengali mathematical reasoning tasks.

Key Capabilities and Performance

Enhanced Bengali Mathematical Reasoning: Achieves a remarkable +37.6 accuracy on the Bn-MGSM benchmark (from 15.2 to 52.8) and +52.7 accuracy on the Bn-MSVAMP benchmark (from 14.1 to 66.8).
High Bengali Reasoning Percentage: Demonstrates 87.80% Bengali reasoning in its solutions, a significant improvement over the base model's 19.64%.
Concise Solutions: Generates solutions with 81.3% fewer tokens (averaging 210 words compared to 1124 words for the base model), making its outputs more efficient.
Context Length: Supports a context length of 4,096 tokens.

Training Methodology

The model's superior performance stems from its unique training:

Supervised Fine-Tuning (SFT): Initial training on the GANIT-SFT dataset (~11k examples) to establish foundational Bengali reasoning.
Curriculum-GRPO: Reinforcement learning with difficulty-aware sampling on the GANIT-RLVR dataset (~7.3k examples), utilizing specific reward functions for format, correctness (Bengali and English answer match), and ensuring a high percentage of Bengali text in reasoning.

Use Cases

This model is ideal for applications requiring accurate and efficient mathematical problem-solving in Bengali, particularly where concise and culturally relevant reasoning is crucial. Its specialized training makes it a strong candidate for educational tools, automated problem solvers, and research in multilingual NLP focusing on mathematical domains.

Overview

GanitLLM-1.7B_SFT_CGRPO: Bengali Mathematical Reasoning

Key Capabilities and Performance

Training Methodology

Use Cases

Full Model Card (README)