Name: thangvip/qwen3-1.7b-dspo-no-sft-sgd-linear-6500 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: thangvip

Model Overview

This model, thangvip/qwen3-1.7b-dspo-no-sft-sgd-linear-6500, is a specialized fine-tuned variant of the Qwen/Qwen3-1.7B base model. It has been developed by thangvip and leverages the TRL (Transformers Reinforcement Learning) framework for its training process.

Key Training Details

Fine-tuning Method: The model was trained using GRPO (Gradient-based Reward Policy Optimization). This method is notably introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300).
Frameworks Used: Key frameworks involved in its development include TRL (version 0.28.0.dev0), Transformers (version 4.57.6), PyTorch (version 2.9.0), Datasets (version 4.5.0), and Tokenizers (version 0.22.2).

Intended Use Cases

Given its training with the GRPO method, which focuses on mathematical reasoning, this model is likely optimized for:

Mathematical problem-solving
Reasoning tasks that benefit from enhanced logical and quantitative understanding.

Developers can quickly integrate this model using the provided transformers pipeline for text generation tasks.

Overview

Model Overview

Key Training Details

Intended Use Cases

Full Model Card (README)