AmberYifan/Qwen3-4B-MATH-GRPO-len-control

TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Sep 16, 2025Architecture:Transformer Cold

AmberYifan/Qwen3-4B-MATH-GRPO-len-control is a 4 billion parameter language model, fine-tuned from Qwen/Qwen3-4B using the GRPO method. This model is specifically optimized for mathematical reasoning tasks, leveraging techniques from the DeepSeekMath research. It is designed to enhance performance in complex mathematical problem-solving, making it suitable for applications requiring robust numerical and logical processing.

Loading preview...

Model Overview

AmberYifan/Qwen3-4B-MATH-GRPO-len-control is a 4 billion parameter language model, fine-tuned from the base Qwen/Qwen3-4B architecture. This model was developed by AmberYifan and utilizes the GRPO (Gradient-based Reward Policy Optimization) method, as introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300).

Key Capabilities

  • Enhanced Mathematical Reasoning: The primary focus of this model is to improve performance on mathematical tasks through the application of the GRPO training methodology.
  • Fine-tuned with TRL: The model was trained using the TRL library, a framework for Transformer Reinforcement Learning.
  • Qwen3-4B Base: Benefits from the foundational capabilities of the Qwen3-4B model, providing a strong base for specialized mathematical fine-tuning.

Training Details

The model's training procedure involved GRPO, a technique designed to push the boundaries of mathematical reasoning in large language models. This approach aims to optimize the model's ability to understand and solve complex mathematical problems. The training was conducted using specific versions of popular ML frameworks, including TRL 0.18.0, Transformers 4.52.3, and PyTorch 2.6.0.

Good For

  • Applications requiring strong mathematical problem-solving abilities.
  • Research and development in mathematical reasoning with LLMs.
  • Tasks where a smaller, specialized model for math is preferred over larger, general-purpose models.