BounharAbdelaziz/Qwen2.5-3B-GRPO-Math-GSM8K

TEXT GENERATIONConcurrency Cost:1Model Size:3.1BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:Jun 25, 2025Architecture:Transformer0.0K Cold

BounharAbdelaziz/Qwen2.5-3B-GRPO-Math-GSM8K is a 3.1 billion parameter Qwen2.5 model, fine-tuned using Group-Relative Policy Optimization (GRPO) specifically on the GSM8K grade-school math dataset. This model is designed to be a lightweight yet highly capable step-by-step math reasoner, optimized for efficient execution on a single consumer GPU. Its primary strength lies in mathematical problem-solving and detailed reasoning.

Loading preview...

Model Overview

This model, BounharAbdelaziz/Qwen2.5-3B-GRPO-Math-GSM8K, is a compact 3.1 billion parameter variant of the Qwen2.5 architecture. It has undergone specialized fine-tuning using Group-Relative Policy Optimization (GRPO), a technique aimed at enhancing its reasoning capabilities.

Key Capabilities

  • Specialized Math Reasoning: The model is specifically trained on the GSM8K grade-school math dataset, making it highly proficient in solving mathematical problems step-by-step.
  • Efficient Performance: Designed to be lightweight, it can run effectively on a single consumer GPU, offering accessibility for various applications.
  • Step-by-Step Tutoring: It functions as a "step-by-step math tutor," providing detailed reasoning for solutions rather than just final answers.

Use Cases

  • Educational Tools: Ideal for applications requiring automated math tutoring or problem-solving assistance.
  • Resource-Constrained Environments: Suitable for deployment where computational resources, such as GPU memory, are limited.
  • Mathematical Problem Solving: Excels at tasks involving arithmetic, algebra, and other grade-school level mathematical challenges, providing transparent reasoning.