Name: dipta007/GanitLLM-0.6B_SFT_GRPO API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: dipta007

GanitLLM-0.6B_SFT_GRPO: Bengali Mathematical Reasoning Model

GanitLLM-0.6B_SFT_GRPO is a 0.6 billion parameter causal language model developed by dipta007, optimized for Bengali mathematical reasoning. Built on the Qwen3-0.6B base, this model was trained using a two-stage pipeline: Supervised Fine-Tuning (SFT) on the GANIT-SFT dataset, followed by standard GRPO (reinforcement learning with random sampling) on GANIT-RLVR. It incorporates specific reward functions for format validation, correctness (Bengali and English answers), and ensuring a high percentage of Bengali text in reasoning.

Key Capabilities & Performance

Enhanced Bengali Mathematical Reasoning: Achieves +24.0 accuracy on Bn-MGSM (from 8.4 to 32.4) and +40.3 accuracy on Bn-MSVAMP (from 12.2 to 52.5) compared to the base Qwen3-0.6B model.
High Bengali Reasoning Percentage: Demonstrates 88.45% Bengali reasoning, a substantial increase from the base model's 12.43%.
Concise Solutions: Generates solutions with 80.6% fewer tokens (averaging 246 words vs. 1265 words for the base model).
Resource-Efficient: Designed for deployment in resource-constrained environments.

Ideal Use Cases

This model is particularly well-suited for applications requiring accurate and efficient mathematical problem-solving in Bengali, especially where computational resources are limited. Its ability to provide concise, Bengali-focused reasoning makes it valuable for educational tools, localized AI assistants, and research in multilingual NLP for mathematical tasks.

Overview

GanitLLM-0.6B_SFT_GRPO: Bengali Mathematical Reasoning Model

Key Capabilities & Performance

Ideal Use Cases

Full Model Card (README)