asparius/Qwen2.5-1.5B-GRPO-1ep-iter2
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Dec 24, 2025Architecture:Transformer Warm
The asparius/Qwen2.5-1.5B-GRPO-1ep-iter2 is a 1.5 billion parameter language model, fine-tuned from Qwen/Qwen2.5-1.5B. It was trained using the GRPO method on the DigitalLearningGmbH/MATH-lighteval dataset, specializing it for mathematical reasoning tasks. This model is optimized to enhance performance in complex mathematical problem-solving, leveraging a training approach designed for advanced mathematical capabilities.
Loading preview...