dipta007/GanitLLM-0.6B_CGRPO
TEXT GENERATIONConcurrency Cost:1Model Size:0.8BQuant:BF16Ctx Length:32kPublished:Jan 1, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

dipta007/GanitLLM-0.6B_CGRPO is a 0.6 billion parameter causal language model developed by dipta007, based on Qwen3-0.6B, with a context length of 4,096 tokens. It is specifically fine-tuned for Bengali mathematical reasoning using Curriculum-GRPO (CGRPO) without supervised fine-tuning (SFT). This model demonstrates improved accuracy on Bengali mathematical benchmarks like Bn-MGSM and Bn-MSVAMP, while generating significantly fewer tokens in its solutions.

Loading preview...

GanitLLM-0.6B_CGRPO Overview

GanitLLM-0.6B_CGRPO is a 0.6 billion parameter causal language model, built upon the Qwen3-0.6B base model, designed for Bengali mathematical reasoning. Developed by dipta007, this variant utilizes a Curriculum-GRPO (CGRPO) training approach directly on the base model, bypassing traditional supervised fine-tuning (SFT).

Key Capabilities & Performance

  • Enhanced Bengali Mathematical Reasoning: Shows notable improvements in accuracy on Bengali mathematical benchmarks, specifically an +8.8 increase on Bn-MGSM (from 8.4 to 17.2) and a +23.0 increase on Bn-MSVAMP (from 12.2 to 35.2).
  • Efficient Solution Generation: Generates solutions with 34.9% fewer tokens (averaging 824 words compared to 1265 for the base model), indicating more concise reasoning.
  • Bengali Reasoning Focus: Maintains a similar percentage of Bengali text in its reasoning compared to the base model (around 11.67%).
  • Training Methodology: Employs a single-stage Curriculum-GRPO pipeline, using difficulty-aware sampling and specific reward functions for format, correctness, and Bengali reasoning.

Considerations

While demonstrating gains, the README notes that this 0.6B scale model shows "limited improvement." For better performance, users are advised to consider the GanitLLM-0.6B_SFT_CGRPO variant or larger models within the GanitLLM collection.