thangvip/qwen3-1.7b-dspo-sft-base

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Jan 31, 2026Architecture:Transformer Warm

The thangvip/qwen3-1.7b-dspo-sft-base is a 1.7 billion parameter Qwen3-based language model, fine-tuned from thangvip/qwen3-1.7b-base-sft-math-1500. This model was trained using the GRPO method, which is designed to enhance mathematical reasoning capabilities. It is optimized for tasks requiring advanced mathematical problem-solving and logical deduction.

Loading preview...

Model Overview

The thangvip/qwen3-1.7b-dspo-sft-base is a 1.7 billion parameter language model built upon the Qwen3 architecture. It is a fine-tuned iteration of the thangvip/qwen3-1.7b-base-sft-math-1500 model, specifically enhanced through a training process utilizing the TRL framework.

Key Capabilities

  • Mathematical Reasoning: The model's training incorporates the GRPO method, as introduced in the "DeepSeekMath" paper, indicating a strong focus on improving mathematical problem-solving and reasoning skills.
  • Instruction Following: As an instruction-tuned model, it is designed to respond effectively to user prompts and questions, as demonstrated by the quick start example.

Training Details

This model was trained using the TRL library, a framework for Transformer Reinforcement Learning. The application of the GRPO method, detailed in the DeepSeekMath paper, suggests an emphasis on advanced mathematical and logical processing during its fine-tuning phase.

When to Use This Model

This model is particularly suitable for applications requiring robust mathematical reasoning and accurate responses to complex, instruction-based queries. Its specialized training makes it a strong candidate for tasks where precise logical and numerical understanding is critical.