Jason-hu/Qwen2.5-3B-GSM8K-GRPO-H200

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:3.1BQuant:BF16Ctx Length:32kPublished:Mar 25, 2026License:apache-2.0Architecture:Transformer Open Weights Warm

Qwen2.5-3B-GSM8K-GRPO-H200 is a 3.1 billion parameter language model developed by Jason-hu, fine-tuned for mathematical reasoning. Built upon the Qwen2.5-3B-Instruct architecture, it leverages LoRA SFT on the GSM8K dataset. This model is specifically optimized for mathematical problem-solving tasks, offering enhanced performance in quantitative reasoning. It supports a context length of 32768 tokens.

Loading preview...

Overview

Jason-hu/Qwen2.5-3B-GSM8K-GRPO-H200 is a specialized language model with 3.1 billion parameters, derived from the Qwen2.5-3B-Instruct base. It has been meticulously fine-tuned using the LoRA SFT (Low-Rank Adaptation for Supervised Fine-Tuning) method within the verl framework.

Key Capabilities

  • Mathematical Reasoning: The model is specifically trained on the GSM8K dataset, which focuses on grade school math word problems, enhancing its ability to understand and solve quantitative tasks.
  • Instruction Following: Inherits strong instruction-following capabilities from its Qwen2.5-3B-Instruct foundation.
  • Efficient Fine-tuning: Utilizes LoRA for efficient adaptation, making it a resource-effective solution for specialized tasks.

Good For

  • Mathematical Problem Solving: Ideal for applications requiring accurate solutions to arithmetic and word problems.
  • Educational Tools: Can be integrated into educational platforms for generating explanations or solving math homework.
  • Quantitative Analysis: Suitable for tasks where precise numerical reasoning is critical.