Name: dipta007/GanitLLM-4B_CGRPO API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: dipta007

Overview

GanitLLM-4B_CGRPO is a 4 billion parameter causal language model developed by dipta007, built upon the Qwen3-4B base model. It is uniquely trained using Curriculum-GRPO (Curriculum-Guided Reinforcement Learning with Policy Optimization) directly on the base model, bypassing traditional supervised fine-tuning (SFT). This approach focuses on enhancing mathematical reasoning capabilities, particularly for Bengali problems.

Key Capabilities and Performance

Enhanced Bengali Mathematical Reasoning: Achieves substantial accuracy gains on Bengali mathematical benchmarks, with a +13.2 increase on Bn-MGSM (from 69.2 to 82.4) and an +8.0 increase on Bn-MSVAMP (from 70.5 to 78.5) compared to the base Qwen3-4B model.
Efficient Solution Generation: Generates solutions with approximately 10.5% fewer tokens (844 words vs. 943 words for the base model), indicating more concise reasoning.
English-Centric Reasoning: While designed for Bengali mathematical problems, this specific variant primarily reasons in English, maintaining a similar Bengali reasoning percentage (14.94%) to the base model.
Curriculum-GRPO Training: Utilizes a single-stage training pipeline with difficulty-aware sampling on the GANIT-RLVR dataset, incorporating reward functions for format validation, correctness (Bengali and English answer match), and Bengali reasoning percentage.

When to Use This Model

This model is ideal for applications requiring high accuracy in solving Bengali mathematical problems, especially when the reasoning process can be predominantly in English. Developers should consider this model for tasks where raw performance on mathematical benchmarks is critical. For use cases specifically requiring Bengali reasoning, the related GanitLLM-4B_SFT_CGRPO model is recommended.

Overview

Overview

Key Capabilities and Performance

When to Use This Model

Full Model Card (README)