jhn9803/Qwen2.5-Math-1.5B-CVAPO-ADAPTIVE-G8

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Jan 1, 2026Architecture:Transformer Warm

The jhn9803/Qwen2.5-Math-1.5B-CVAPO-ADAPTIVE-G8 model is a 1.5 billion parameter Qwen2.5-Instruct variant fine-tuned for mathematical reasoning. It was trained using the GRPO method on the Hendrycks-Math dataset, specializing in complex mathematical problem-solving. This model is designed to enhance the mathematical capabilities of the base Qwen2.5-1.5B-Instruct architecture, making it suitable for tasks requiring advanced numerical and logical deduction.

Loading preview...

Model Overview

The jhn9803/Qwen2.5-Math-1.5B-CVAPO-ADAPTIVE-G8 is a specialized language model based on the Qwen2.5-1.5B-Instruct architecture. It has been fine-tuned specifically for mathematical reasoning tasks using the jhn9803/hendrycks-math-with-answers dataset.

Key Capabilities

  • Enhanced Mathematical Reasoning: Optimized to solve complex mathematical problems, leveraging a dataset rich in mathematical questions and answers.
  • GRPO Training Method: Utilizes the GRPO (Guided Reinforcement Learning with Policy Optimization) method, as introduced in the DeepSeekMath paper, to push the limits of mathematical reasoning in open language models.
  • Instruction-Following: Retains the instruction-following capabilities of its base Qwen2.5-Instruct model while specializing in mathematical contexts.

Training Details

The model was trained using the TRL library (version 0.18.0) with PyTorch 2.6.0. The GRPO method, detailed in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models," was central to its training procedure.

Ideal Use Cases

This model is particularly well-suited for applications requiring:

  • Solving advanced mathematical problems.
  • Generating step-by-step mathematical solutions.
  • Educational tools focused on mathematics.
  • Research in mathematical reasoning with LLMs.