Name: thangvip/qwen3-1.7b-dspo-no-sft-exp2 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: thangvip

Model Overview

This model, thangvip/qwen3-1.7b-dspo-no-sft-exp2, is a 2 billion parameter language model derived from the Qwen3-1.7B architecture. It has been specifically fine-tuned using the GRPO (Gradient Regularized Policy Optimization) method, a technique highlighted in the DeepSeekMath research paper. This training approach aims to enhance the model's reasoning abilities, particularly in complex problem-solving scenarios.

Key Capabilities

Enhanced Reasoning: Leverages the GRPO training method to improve logical and analytical processing.
Qwen3-1.7B Base: Built upon the robust Qwen3-1.7B foundation, providing a strong general language understanding.
TRL Framework: Developed using the TRL (Transformers Reinforcement Learning) library, indicating a focus on reinforcement learning from human feedback or similar optimization techniques.

Training Details

The model's training procedure specifically utilized GRPO, as detailed in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This suggests an emphasis on improving performance in tasks that demand structured thought and problem-solving, potentially including mathematical or scientific reasoning. The training was conducted using TRL version 0.28.0.dev0, with Transformers 4.57.6 and PyTorch 2.9.0.

Overview

Model Overview

Key Capabilities

Training Details

Full Model Card (README)