Name: thangvip/qwen3-1.7b-dspo-sft-base API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: thangvip

Model Overview

The thangvip/qwen3-1.7b-dspo-sft-base is a 1.7 billion parameter language model built upon the Qwen3 architecture. It is a fine-tuned iteration of the thangvip/qwen3-1.7b-base-sft-math-1500 model, specifically enhanced through a training process utilizing the TRL framework.

Key Capabilities

Mathematical Reasoning: The model's training incorporates the GRPO method, as introduced in the "DeepSeekMath" paper, indicating a strong focus on improving mathematical problem-solving and reasoning skills.
Instruction Following: As an instruction-tuned model, it is designed to respond effectively to user prompts and questions, as demonstrated by the quick start example.

Training Details

This model was trained using the TRL library, a framework for Transformer Reinforcement Learning. The application of the GRPO method, detailed in the DeepSeekMath paper, suggests an emphasis on advanced mathematical and logical processing during its fine-tuning phase.

When to Use This Model

This model is particularly suitable for applications requiring robust mathematical reasoning and accurate responses to complex, instruction-based queries. Its specialized training makes it a strong candidate for tasks where precise logical and numerical understanding is critical.

Overview

Model Overview

Key Capabilities

Training Details

When to Use This Model

Full Model Card (README)