Ilia2003Mah/qwen2.5-1.5b-gsm8k-train-step500
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Mar 23, 2026Architecture:Transformer Warm

The Ilia2003Mah/qwen2.5-1.5b-gsm8k-train-step500 is a 1.5 billion parameter language model, likely based on the Qwen2.5 architecture, fine-tuned for mathematical reasoning. With a context length of 32768 tokens, this model is specifically optimized for tasks requiring numerical problem-solving. Its primary differentiator is its specialized training on the GSM8K dataset, indicating a focus on arithmetic and mathematical word problems.

Loading preview...