lhkhiem28/Qwen2.5-3B-grpo
TEXT GENERATIONConcurrency Cost:1Model Size:3.1BQuant:BF16Ctx Length:32kPublished:Apr 3, 2026Architecture:Transformer Cold

lhkhiem28/Qwen2.5-3B-grpo is a 3.1 billion parameter causal language model fine-tuned from Qwen/Qwen2.5-3B. It leverages the GRPO (Generative Reinforcement Learning with Policy Optimization) method, as introduced in DeepSeekMath, to enhance its reasoning capabilities. This model is specifically optimized for tasks requiring advanced mathematical and logical reasoning, making it suitable for complex problem-solving applications.

Loading preview...