Name: thangvip/qwen3-1.7b-dspo-no-sft-sgd-linear API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: thangvip

Model Overview

thangvip/qwen3-1.7b-dspo-no-sft-sgd-linear is a 2 billion parameter language model, fine-tuned from the base Qwen/Qwen3-1.7B architecture. This model distinguishes itself through its training methodology, employing the GRPO (Generalized Reinforcement Learning with Policy Optimization) technique. GRPO is a method highlighted in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300), suggesting an optimization for reasoning-intensive tasks.

Key Characteristics

Base Model: Qwen3-1.7B, a robust foundation for language understanding.
Training Method: Fine-tuned using GRPO, a specialized technique for improving reasoning capabilities, implemented via the TRL library.
Context Length: Supports a substantial context window of 40960 tokens, enabling the processing of longer and more complex inputs.

Potential Use Cases

Given its GRPO-based training, this model is likely well-suited for applications demanding:

Complex Reasoning: Tasks that require logical deduction, problem-solving, or structured thinking.
Mathematical Applications: While not explicitly stated as a math model, its training method's origin in DeepSeekMath suggests potential benefits for mathematical reasoning tasks.
Advanced Language Understanding: Leveraging its large context window for nuanced comprehension of extensive texts.

Overview

Model Overview

Key Characteristics

Potential Use Cases

Full Model Card (README)