Fardan/Qwen2.5-1.5B-Instruct-Math-Reasoning-GRPO-Tuned

TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Apr 20, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

Fardan/Qwen2.5-1.5B-Instruct-Math-Reasoning-GRPO-Tuned is a 1.5 billion parameter Qwen2.5-based instruction-tuned language model developed by Fardan. This model is specifically fine-tuned for mathematical and reasoning tasks, building upon its predecessor Fardan/Qwen2.5-1.5B-Instruct-Math-Reasoning-SFT-v1. It leverages the Unsloth framework for accelerated training, making it an efficient choice for specialized analytical applications. With a 32K context length, it is designed to handle complex problem-solving scenarios.

Loading preview...

Model Overview

Fardan/Qwen2.5-1.5B-Instruct-Math-Reasoning-GRPO-Tuned is a 1.5 billion parameter instruction-tuned model developed by Fardan. It is based on the Qwen2.5 architecture and represents a further fine-tuned iteration of the Fardan/Qwen2.5-1.5B-Instruct-Math-Reasoning-SFT-v1 model. This model is specifically optimized for tasks requiring strong mathematical and reasoning capabilities.

Key Characteristics

  • Specialized Fine-tuning: This model has undergone specific fine-tuning to enhance its performance in mathematical problem-solving and general reasoning tasks.
  • Efficient Training: The model was trained using the Unsloth framework, which enabled a 2x faster training process compared to standard methods.
  • Context Length: It supports a context length of 32768 tokens, allowing it to process and understand longer and more complex inputs relevant to its specialized domain.

Use Cases

This model is particularly well-suited for applications that require:

  • Solving mathematical problems and equations.
  • Logical deduction and reasoning from given information.
  • Tasks where understanding and generating structured, analytical responses are crucial.

It is an ideal choice for developers looking for a compact yet powerful model focused on numerical and logical intelligence, benefiting from accelerated training for quick deployment and iteration.