jhn9803/Qwen2.5-MATH-1.5B-Instruct-DAPO-G8

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Dec 28, 2025Architecture:Transformer Warm

The jhn9803/Qwen2.5-MATH-1.5B-Instruct-DAPO-G8 model is a 1.5 billion parameter instruction-tuned language model based on the Qwen2.5 architecture. Developed by jhn9803, it is specifically fine-tuned for mathematical reasoning tasks using the hendrycks-math-with-answers dataset. This model leverages the GRPO training method, making it particularly effective for solving complex mathematical problems.

Loading preview...

Model Overview

The jhn9803/Qwen2.5-MATH-1.5B-Instruct-DAPO-G8 is a 1.5 billion parameter instruction-tuned model built upon the Qwen2.5-Math-1.5B-Instruct base. It has been specialized for mathematical reasoning through fine-tuning on the jhn9803/hendrycks-math-with-answers dataset.

Key Capabilities

  • Mathematical Reasoning: Optimized for solving mathematical problems, leveraging a dataset specifically curated for this purpose.
  • GRPO Training: Incorporates the GRPO (Gradient-based Reward Policy Optimization) method, as introduced in the DeepSeekMath paper, to enhance its mathematical problem-solving abilities.
  • Instruction Following: Designed to follow instructions effectively, making it suitable for interactive mathematical tasks.

Training Details

The model was trained using the TRL (Transformer Reinforcement Learning) framework. The GRPO method, which is central to its mathematical performance, is detailed in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300).

Good For

  • Mathematical Problem Solving: Ideal for applications requiring the model to understand and solve various mathematical challenges.
  • Research in Mathematical LLMs: Provides a base for further experimentation and development in mathematical reasoning with language models.