Jason-hu/Qwen2.5-3B-GSM8K-SFT
TEXT GENERATIONConcurrency Cost:1Model Size:3.1BQuant:BF16Ctx Length:32kPublished:Mar 25, 2026License:apache-2.0Architecture:Transformer Open Weights Warm
The Jason-hu/Qwen2.5-3B-GSM8K-SFT is a 3.1 billion parameter language model built upon the Qwen2.5-3B-Instruct architecture. This model has been fine-tuned using LoRA SFT on the GSM8K dataset, specifically optimizing its performance for mathematical reasoning tasks. With a context length of 32768 tokens, it is designed to excel in solving grade school math problems.
Loading preview...