od2961/Qwen2.5-1.5B-Open-R1-GRPO-math-v1
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Aug 13, 2025Architecture:Transformer Warm

od2961/Qwen2.5-1.5B-Open-R1-GRPO-math-v1 is a 1.5 billion parameter language model, fine-tuned from Qwen/Qwen2.5-1.5B-Instruct. It was specifically trained using the GRPO method on the OpenR1-Math-220k dataset, optimizing its capabilities for mathematical reasoning tasks. This model is designed to excel in solving complex mathematical problems, leveraging its specialized training for enhanced accuracy and performance in this domain.

Loading preview...