thangvip/qwen2.5-1.5b-seq-dspo-sgd-linear
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Feb 24, 2026Architecture:Transformer Warm

thangvip/qwen2.5-1.5b-seq-dspo-sgd-linear is a 1.5 billion parameter causal language model, fine-tuned from Qwen/Qwen2.5-1.5B-Instruct. This model utilizes the GRPO training method, as introduced in the DeepSeekMath paper, to enhance its capabilities. It is specifically optimized for improved performance through this advanced training procedure, making it suitable for tasks requiring refined language understanding and generation.

Loading preview...