wzx111/Qwen3-1.7B-MATH-GDPO
wzx111/Qwen3-1.7B-MATH-GDPO is a 1.7 billion parameter language model, fine-tuned from Qwen/Qwen3-1.7B. It is specifically optimized for mathematical reasoning tasks, leveraging the GRPO training method on the MATH-lighteval-level_2 dataset. This model is designed to enhance performance in complex mathematical problem-solving.
Loading preview...
Overview
wzx111/Qwen3-1.7B-MATH-GDPO is a 1.7 billion parameter language model, fine-tuned from the base Qwen/Qwen3-1.7B model. Its primary focus is on mathematical reasoning, achieved through specialized training.
Key Capabilities
- Mathematical Reasoning: The model has been fine-tuned on the watermelonhjg/MATH-lighteval-level_2 dataset, making it proficient in solving mathematical problems.
- GRPO Training Method: It utilizes the GRPO (Gradient Regularized Policy Optimization) method, introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300), to enhance its mathematical capabilities.
- TRL Framework: The training was conducted using the TRL (Transformer Reinforcement Learning) library.
Use Cases
This model is particularly well-suited for applications requiring strong mathematical problem-solving abilities. Developers can integrate it into systems that need to process and generate responses for complex mathematical queries or educational tools focused on math.