Ilia2003Mah/qwen2.5-1.5b-gsm8k-train-step4500
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Mar 24, 2026Architecture:Transformer Warm

The Ilia2003Mah/qwen2.5-1.5b-gsm8k-train-step4500 is a 1.5 billion parameter language model, likely based on the Qwen2.5 architecture, with a context length of 32768 tokens. This model has undergone specific training, indicated by "gsm8k-train-step4500," suggesting fine-tuning for mathematical reasoning and problem-solving tasks, particularly those found in the GSM8K dataset. Its primary strength lies in numerical and logical reasoning, making it suitable for applications requiring accurate arithmetic and step-by-step problem-solving.

Loading preview...