khazarai/Math-RL
TEXT GENERATIONConcurrency Cost:1Model Size:0.5BQuant:BF16Ctx Length:32kPublished:Mar 24, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

khazarai/Math-RL is a 0.5 billion parameter language model, fine-tuned from Qwen2.5-0.5B-Instruct using Group Relative Policy Optimization (GRPO) on a curated dataset of 700 math problems. This model is specifically optimized to enhance step-by-step reasoning for mathematical problem-solving. It is designed for educational assistance, research into small-scale RLHF-style fine-tuning, and as a lightweight math reasoning assistant in constrained environments.

Loading preview...