rohitcraftsyt/soc-grpo-tier1

TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Apr 24, 2026Architecture:Transformer Cold

The rohitcraftsyt/soc-grpo-tier1 model is a 1.5 billion parameter language model fine-tuned from unsloth/qwen2.5-1.5b-instruct-unsloth-bnb-4bit. It was trained using the GRPO method, which is designed to enhance mathematical reasoning capabilities. This model is optimized for tasks requiring improved logical and mathematical problem-solving, building upon its Qwen2.5 base.

Loading preview...

Overview

rohitcraftsyt/soc-grpo-tier1 is a 1.5 billion parameter language model derived from the unsloth/qwen2.5-1.5b-instruct-unsloth-bnb-4bit base. Its key differentiator lies in its training methodology: it has been fine-tuned using GRPO (Gradient-based Reasoning Policy Optimization). This method, introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models," aims to significantly improve the model's ability to handle complex mathematical and logical reasoning tasks.

Key Capabilities

  • Enhanced Mathematical Reasoning: The primary focus of this model's training was to boost its performance on mathematical problem-solving, making it suitable for applications where numerical and logical accuracy are critical.
  • Instruction Following: As it is fine-tuned from an instruction-tuned base model, it retains strong instruction-following capabilities.
  • Efficient Deployment: Building on a 1.5B parameter model, it offers a balance between performance and computational efficiency, making it viable for resource-constrained environments.

Training Details

The model was trained using the TRL (Transformer Reinforcement Learning) framework, specifically version 0.24.0, with Transformers 4.57.6 and PyTorch 2.10.0+cu128. The application of the GRPO method suggests an emphasis on improving reasoning through a specialized optimization approach.

Good For

  • Applications requiring robust mathematical problem-solving.
  • Tasks that benefit from improved logical deduction and reasoning.
  • Scenarios where a smaller, efficient model with specialized reasoning capabilities is preferred over larger, general-purpose models.