cheongmyeong17/Qwen2.5-3B-MATH-GRPO

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:3.1BQuant:BF16Ctx Length:32kPublished:Jul 17, 2025Architecture:Transformer0.0K Warm

cheongmyeong17/Qwen2.5-3B-MATH-GRPO is a 3.1 billion parameter language model fine-tuned from Qwen/Qwen2.5-3B-Instruct. It specializes in mathematical reasoning tasks, having been trained on the jhn9803/hendrycks-math-with-answers dataset. This model utilizes the GRPO training method, as introduced in the DeepSeekMath paper, to enhance its mathematical problem-solving capabilities. It is designed for applications requiring strong mathematical understanding and accurate numerical reasoning.

Loading preview...

Model Overview

cheongmyeong17/Qwen2.5-3B-MATH-GRPO is a 3.1 billion parameter language model derived from the Qwen/Qwen2.5-3B-Instruct architecture. Its primary distinction lies in its specialized fine-tuning for mathematical reasoning, leveraging the jhn9803/hendrycks-math-with-answers dataset.

Key Capabilities

  • Enhanced Mathematical Reasoning: Specifically trained to improve performance on mathematical problems and tasks.
  • GRPO Training Method: Incorporates the GRPO (Gradient-based Reward Policy Optimization) method, detailed in the DeepSeekMath paper, which is designed to push the limits of mathematical reasoning in open language models.
  • Instruction-Following Base: Built upon an instruction-tuned base model, allowing for general conversational abilities alongside its mathematical specialization.

Good For

  • Mathematical Problem Solving: Ideal for applications requiring the model to understand and solve complex mathematical equations, word problems, and logical reasoning tasks.
  • Educational Tools: Can be integrated into platforms for tutoring, homework assistance, or generating mathematical explanations.
  • Research in Mathematical AI: Provides a specialized base for further experimentation and development in AI models focused on quantitative analysis.