thangvip/qwen2.5-1.5b-dspo-no-sft-sgd-linear
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Feb 13, 2026Architecture:Transformer Warm

The thangvip/qwen2.5-1.5b-dspo-no-sft-sgd-linear model is a 1.5 billion parameter language model fine-tuned from Qwen/Qwen2.5-1.5B-Instruct. It utilizes the GRPO training method, as introduced in the DeepSeekMath paper, which focuses on enhancing mathematical reasoning capabilities. This model is primarily designed for tasks requiring improved reasoning, particularly in mathematical contexts, and supports a substantial context length of 131072 tokens.

Loading preview...