jhn9803/Qwen2.5-Math-1.5B-CVAPO-ADAPTIVE-G8
The jhn9803/Qwen2.5-Math-1.5B-CVAPO-ADAPTIVE-G8 model is a 1.5 billion parameter Qwen2.5-Instruct variant fine-tuned for mathematical reasoning. It was trained using the GRPO method on the Hendrycks-Math dataset, specializing in complex mathematical problem-solving. This model is designed to enhance the mathematical capabilities of the base Qwen2.5-1.5B-Instruct architecture, making it suitable for tasks requiring advanced numerical and logical deduction.
Loading preview...
Model Overview
The jhn9803/Qwen2.5-Math-1.5B-CVAPO-ADAPTIVE-G8 is a specialized language model based on the Qwen2.5-1.5B-Instruct architecture. It has been fine-tuned specifically for mathematical reasoning tasks using the jhn9803/hendrycks-math-with-answers dataset.
Key Capabilities
- Enhanced Mathematical Reasoning: Optimized to solve complex mathematical problems, leveraging a dataset rich in mathematical questions and answers.
- GRPO Training Method: Utilizes the GRPO (Guided Reinforcement Learning with Policy Optimization) method, as introduced in the DeepSeekMath paper, to push the limits of mathematical reasoning in open language models.
- Instruction-Following: Retains the instruction-following capabilities of its base Qwen2.5-Instruct model while specializing in mathematical contexts.
Training Details
The model was trained using the TRL library (version 0.18.0) with PyTorch 2.6.0. The GRPO method, detailed in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models," was central to its training procedure.
Ideal Use Cases
This model is particularly well-suited for applications requiring:
- Solving advanced mathematical problems.
- Generating step-by-step mathematical solutions.
- Educational tools focused on mathematics.
- Research in mathematical reasoning with LLMs.