Name: dipta007/GanitLLM-0.6B_CGRPO API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: dipta007

GanitLLM-0.6B_CGRPO Overview

GanitLLM-0.6B_CGRPO is a 0.6 billion parameter causal language model, built upon the Qwen3-0.6B base model, designed for Bengali mathematical reasoning. Developed by dipta007, this variant utilizes a Curriculum-GRPO (CGRPO) training approach directly on the base model, bypassing traditional supervised fine-tuning (SFT).

Key Capabilities & Performance

Enhanced Bengali Mathematical Reasoning: Shows notable improvements in accuracy on Bengali mathematical benchmarks, specifically an +8.8 increase on Bn-MGSM (from 8.4 to 17.2) and a +23.0 increase on Bn-MSVAMP (from 12.2 to 35.2).
Efficient Solution Generation: Generates solutions with 34.9% fewer tokens (averaging 824 words compared to 1265 for the base model), indicating more concise reasoning.
Bengali Reasoning Focus: Maintains a similar percentage of Bengali text in its reasoning compared to the base model (around 11.67%).
Training Methodology: Employs a single-stage Curriculum-GRPO pipeline, using difficulty-aware sampling and specific reward functions for format, correctness, and Bengali reasoning.

Considerations

While demonstrating gains, the README notes that this 0.6B scale model shows "limited improvement." For better performance, users are advised to consider the GanitLLM-0.6B_SFT_CGRPO variant or larger models within the GanitLLM collection.

Overview

GanitLLM-0.6B_CGRPO Overview

Key Capabilities & Performance

Considerations

Full Model Card (README)