thangvip/qwen2.5-1.5b-gspo-sgd-linear

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Mar 1, 2026Architecture:Transformer Warm

thangvip/qwen2.5-1.5b-gspo-sgd-linear is a 1.5 billion parameter causal language model, fine-tuned from Qwen/Qwen2.5-1.5B-Instruct. This model was trained using the GRPO method, as introduced in the DeepSeekMath paper, to enhance mathematical reasoning capabilities. It is optimized for tasks requiring improved logical and mathematical problem-solving, building upon the base Qwen2.5 architecture. The model is suitable for applications where efficient mathematical reasoning in a smaller parameter footprint is beneficial.

Loading preview...

Model Overview

thangvip/qwen2.5-1.5b-gspo-sgd-linear is a 1.5 billion parameter language model derived from the Qwen/Qwen2.5-1.5B-Instruct base model. It has been specifically fine-tuned using the TRL framework, incorporating the GRPO (Gradient-based Reward Policy Optimization) method.

Key Capabilities & Training

  • Enhanced Mathematical Reasoning: The core differentiator of this model is its training with GRPO, a technique detailed in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This suggests an optimization for tasks that involve mathematical problem-solving and logical deduction.
  • Base Model: Built upon the Qwen2.5-1.5B-Instruct architecture, it inherits the general instruction-following capabilities of its parent model.
  • Training Framework: The fine-tuning process utilized the TRL (Transformers Reinforcement Learning) library, indicating a reinforcement learning approach to align the model's outputs.

Use Cases

This model is particularly well-suited for applications requiring improved mathematical and logical reasoning, especially within the constraints of a 1.5 billion parameter model. Developers can leverage its specialized training for tasks where the base Qwen2.5-1.5B-Instruct might fall short in complex reasoning scenarios.