Ilia2003Mah/qwen2.5-1.5b-gsm8k-train-step6500
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Mar 24, 2026Architecture:Transformer Cold
Ilia2003Mah/qwen2.5-1.5b-gsm8k-train-step6500 is a 1.5 billion parameter language model based on the Qwen2.5 architecture, fine-tuned by Ilia2003Mah. With a context length of 32768 tokens, this model is specifically trained on the GSM8K dataset, indicating a specialization in mathematical reasoning and problem-solving tasks. Its primary use case is for applications requiring robust performance on arithmetic and logical challenges.
Loading preview...