Ilia2003Mah/qwen2.5-1.5b-gsm8k-train-step8500
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Mar 24, 2026Architecture:Transformer Warm

The Ilia2003Mah/qwen2.5-1.5b-gsm8k-train-step8500 is a 1.5 billion parameter language model based on the Qwen2.5 architecture. This model is specifically fine-tuned for mathematical reasoning tasks, particularly on the GSM8K dataset, indicating an optimization for arithmetic and problem-solving capabilities. With a context length of 32768 tokens, it is designed to handle complex mathematical problems requiring extensive context. Its primary strength lies in numerical and logical reasoning, making it suitable for applications demanding precise quantitative analysis.

Loading preview...