Jason-hu/Qwen2.5-3B-GSM8K-SFT

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:3.1BQuant:BF16Ctx Length:32kPublished:Mar 25, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

The Jason-hu/Qwen2.5-3B-GSM8K-SFT is a 3.1 billion parameter language model built upon the Qwen2.5-3B-Instruct architecture. This model has been fine-tuned using LoRA SFT on the GSM8K dataset, specifically optimizing its performance for mathematical reasoning tasks. With a context length of 32768 tokens, it is designed to excel in solving grade school math problems.

Loading preview...

Model Overview

Jason-hu/Qwen2.5-3B-GSM8K-SFT is a specialized 3.1 billion parameter language model derived from the Qwen2.5-3B-Instruct base. It has been specifically fine-tuned to enhance its capabilities in mathematical reasoning.

Key Capabilities

  • Mathematical Reasoning: Optimized for solving grade school mathematics problems.
  • Fine-tuned Performance: Utilizes LoRA SFT (Supervised Fine-Tuning) on the GSM8K dataset to improve accuracy in math-related tasks.
  • Base Architecture: Built on the robust Qwen2.5-3B-Instruct model, providing a strong foundation for language understanding.
  • Context Length: Supports a substantial context window of 32768 tokens, allowing for processing longer problem descriptions or multi-step reasoning.

Good For

  • Educational Applications: Ideal for systems requiring automated solutions or explanations for grade school math problems.
  • Research in Mathematical LLMs: Useful for researchers exploring the effectiveness of fine-tuning smaller models for specific reasoning domains.
  • Benchmarking: Can serve as a baseline or comparison model for evaluating performance on mathematical datasets like GSM8K.