Harsha901/Qwen3_4B-GRPO-Math

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Dec 17, 2025License:apache-2.0Architecture:Transformer Open Weights Warm

Harsha901/Qwen3_4B-GRPO-Math is a 4 billion parameter Qwen3 model developed by Harsha901, fine-tuned from unsloth/Qwen3-4B-Base. This model was optimized for mathematical tasks, leveraging Unsloth and Huggingface's TRL library for faster training. It features a substantial 40960 token context length, making it suitable for complex problem-solving and detailed mathematical reasoning.

Loading preview...

Harsha901/Qwen3_4B-GRPO-Math Overview

This model is a 4 billion parameter Qwen3 variant, developed by Harsha901 and fine-tuned from the unsloth/Qwen3-4B-Base model. It was specifically trained to enhance its capabilities in mathematical reasoning and problem-solving. The fine-tuning process utilized Unsloth and Huggingface's TRL library, which enabled a 2x faster training speed.

Key Capabilities

  • Mathematical Reasoning: Optimized for handling mathematical tasks and complex numerical problems.
  • Efficient Training: Benefits from Unsloth's accelerated training techniques.
  • Extended Context: Features a 40960 token context length, allowing for processing and understanding lengthy mathematical problems or detailed instructions.

Good For

  • Applications requiring strong mathematical problem-solving.
  • Tasks that benefit from a large context window for detailed input or output.
  • Developers looking for a Qwen3-based model with enhanced numerical capabilities.