Name: dipta007/GanitLLM-1.7B_SFT_GRPO API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: dipta007

GanitLLM-1.7B_SFT_GRPO Overview

GanitLLM-1.7B_SFT_GRPO is a compact 1.7 billion parameter causal language model developed by dipta007, built upon the Qwen3-1.7B base architecture. It is specifically optimized for Bengali mathematical reasoning through a two-stage training process involving Supervised Fine-Tuning (SFT) and Generative Reinforcement Learning with Policy Optimization (GRPO).

Key Capabilities & Performance

This model demonstrates substantial improvements over its base model, particularly in Bengali mathematical tasks:

Enhanced Accuracy: Achieves +38.4 accuracy on the Bn-MGSM benchmark (from 15.2 to 53.6) and +52.8 accuracy on the Bn-MSVAMP benchmark (from 14.1 to 66.9).
Bengali Reasoning Focus: Exhibits 88.32% Bengali reasoning in its solutions, a significant increase from the base model's 19.64%.
Concise Solutions: Generates solutions with 81.6% fewer tokens, averaging 207 words compared to 1124 words from the base model.

Training Methodology

The model was trained using:

Supervised Fine-Tuning (SFT): Grounded on the GANIT-SFT dataset, comprising approximately 11,000 examples.
GRPO: Applied standard reinforcement learning with random sampling on the GANIT-RLVR dataset, which includes around 7,300 examples. Reward functions were designed to validate output format, ensure correctness (Bengali and English answer matches), and promote Bengali text in reasoning steps.

Use Cases

GanitLLM-1.7B_SFT_GRPO is ideal for applications requiring accurate and efficient mathematical problem-solving in Bengali, especially where concise and culturally relevant reasoning is crucial. Its compact size and specialized training make it suitable for deployment in resource-constrained environments or for focused Bengali NLP tasks.

Overview

GanitLLM-1.7B_SFT_GRPO Overview

Key Capabilities & Performance

Training Methodology

Use Cases

Full Model Card (README)