lhkhiem28/Qwen2.5-3B-grpo

TEXT GENERATIONConcurrency Cost:1Model Size:3.1BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:Apr 3, 2026Architecture:Transformer Cold

lhkhiem28/Qwen2.5-3B-grpo is a fine-tuned version of the Qwen2.5-3B-Instruct causal language model, developed by Qwen. This model has been specifically trained using the GRPO method on the HA-GRPO-datasets, focusing on enhancing mathematical reasoning capabilities. It is optimized for tasks requiring robust mathematical problem-solving and logical deduction. The model leverages the Qwen2.5-3B architecture, making it suitable for applications where mathematical accuracy is critical.

Loading preview...

Overview

lhkhiem28/Qwen2.5-3B-grpo is a specialized language model derived from Qwen's Qwen2.5-3B-Instruct. Its primary distinction lies in its training methodology: it has been fine-tuned using the GRPO (Gradient-based Reward Policy Optimization) method. This technique, introduced in the DeepSeekMath paper, is designed to significantly improve a model's mathematical reasoning abilities.

Key Capabilities

  • Enhanced Mathematical Reasoning: Specifically optimized for tasks that require complex mathematical problem-solving and logical deduction, building upon the foundation of the Qwen2.5-3B-Instruct model.
  • GRPO Training: Utilizes the GRPO method, as detailed in the DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models paper, to achieve its specialized performance.
  • Instruction-following: Retains the instruction-following capabilities of its base model, Qwen2.5-3B-Instruct, making it adaptable to various prompt formats.

Good For

  • Mathematical Problem Solving: Ideal for applications requiring the model to understand and solve mathematical equations, proofs, or word problems.
  • Research and Development: Useful for researchers exploring advanced training techniques like GRPO and their impact on specific reasoning tasks.
  • Educational Tools: Can be integrated into tools designed to assist with mathematical learning or problem verification.