thangvip/qwen3-1.7b-dspo-no-sft-sgd-linear
TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Feb 4, 2026Architecture:Transformer Warm

The thangvip/qwen3-1.7b-dspo-no-sft-sgd-linear model is a 2 billion parameter language model, fine-tuned from Qwen/Qwen3-1.7B. It utilizes the GRPO training method, as introduced in the DeepSeekMath paper, for enhanced performance. This model is specifically adapted for tasks requiring advanced reasoning, leveraging its specialized training approach. Its 40960-token context length supports processing extensive inputs for complex problem-solving.

Loading preview...