shawntzx/Qwen2.5-3B-GRPO-3_13_math
TEXT GENERATIONConcurrency Cost:1Model Size:3.1BQuant:BF16Ctx Length:32kPublished:Mar 13, 2025Architecture:Transformer Warm
shawntzx/Qwen2.5-3B-GRPO-3_13_math is a 3.1 billion parameter language model fine-tuned from Qwen/Qwen2.5-3B-Instruct. It utilizes the GRPO (Gradient-based Reward Policy Optimization) method, as introduced in the DeepSeekMath paper, to enhance its mathematical reasoning capabilities. This model is specifically optimized for complex mathematical problem-solving and logical deduction tasks, making it suitable for applications requiring advanced numerical and symbolic reasoning.
Loading preview...