cheongmyeong17/Qwen2.5-MATH-1.5B-GRPO-Best

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Jul 30, 2025Architecture:Transformer Warm

cheongmyeong17/Qwen2.5-MATH-1.5B-GRPO-Best is a 1.5 billion parameter language model, fine-tuned from Qwen/Qwen2.5-Math-1.5B-Instruct. Developed by cheongmyeong17, this model specializes in mathematical reasoning tasks. It was trained using the GRPO method on the hendrycks-math-with-answers dataset, making it optimized for solving complex mathematical problems.

Loading preview...

Model Overview

cheongmyeong17/Qwen2.5-MATH-1.5B-GRPO-Best is a 1.5 billion parameter language model derived from Qwen/Qwen2.5-Math-1.5B-Instruct. This model has been specifically fine-tuned for enhanced mathematical reasoning capabilities.

Key Capabilities

  • Mathematical Reasoning: Optimized for solving mathematical problems, leveraging training on the hendrycks-math-with-answers dataset.
  • GRPO Training: Utilizes the GRPO (Gradient-based Reward Policy Optimization) method, as introduced in the DeepSeekMath paper, to improve performance in mathematical contexts.
  • Instruction Following: Inherits instruction-following capabilities from its base model, Qwen2.5-Math-1.5B-Instruct.

Training Details

The model was trained using the TRL (Transformer Reinforcement Learning) framework. The training procedure involved fine-tuning with GRPO, a technique designed to push the limits of mathematical reasoning in language models. This specialized training makes it particularly adept at handling mathematical queries and problem-solving.