Name: AmberYifan/Qwen3-4B-MATH-GRPO-len-control API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: AmberYifan

Model Overview

AmberYifan/Qwen3-4B-MATH-GRPO-len-control is a 4 billion parameter language model, fine-tuned from the base Qwen/Qwen3-4B architecture. This model was developed by AmberYifan and utilizes the GRPO (Gradient-based Reward Policy Optimization) method, as introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300).

Key Capabilities

Enhanced Mathematical Reasoning: The primary focus of this model is to improve performance on mathematical tasks through the application of the GRPO training methodology.
Fine-tuned with TRL: The model was trained using the TRL library, a framework for Transformer Reinforcement Learning.
Qwen3-4B Base: Benefits from the foundational capabilities of the Qwen3-4B model, providing a strong base for specialized mathematical fine-tuning.

Training Details

The model's training procedure involved GRPO, a technique designed to push the boundaries of mathematical reasoning in large language models. This approach aims to optimize the model's ability to understand and solve complex mathematical problems. The training was conducted using specific versions of popular ML frameworks, including TRL 0.18.0, Transformers 4.52.3, and PyTorch 2.6.0.

Good For

Applications requiring strong mathematical problem-solving abilities.
Research and development in mathematical reasoning with LLMs.
Tasks where a smaller, specialized model for math is preferred over larger, general-purpose models.

Overview

Model Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)