thangvip/qwen2.5-1.5b-gspo-sgd-linear
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Mar 1, 2026Architecture:Transformer Warm
thangvip/qwen2.5-1.5b-gspo-sgd-linear is a 1.5 billion parameter causal language model, fine-tuned from Qwen/Qwen2.5-1.5B-Instruct. This model was trained using the GRPO method, as introduced in the DeepSeekMath paper, to enhance mathematical reasoning capabilities. It is optimized for tasks requiring improved logical and mathematical problem-solving, building upon the base Qwen2.5 architecture. The model is suitable for applications where efficient mathematical reasoning in a smaller parameter footprint is beneficial.
Loading preview...