yuerxin/DeepSeek-R1-Distill-Qwen-1.5B-GRPO

TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:Sep 20, 2025Architecture:Transformer Cold

The yuerxin/DeepSeek-R1-Distill-Qwen-1.5B-GRPO model is a 1.5 billion parameter language model, fine-tuned from deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B. It was specifically trained on the OpenR1-Math-220k dataset using the GRPO method, which is designed to enhance mathematical reasoning capabilities. This model is optimized for tasks requiring strong mathematical problem-solving and logical deduction, leveraging its 32768 token context length.

Loading preview...

Model Overview

This model, yuerxin/DeepSeek-R1-Distill-Qwen-1.5B-GRPO, is a specialized 1.5 billion parameter language model. It is a fine-tuned variant of the deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B base model, specifically enhanced for mathematical reasoning tasks.

Key Capabilities

  • Mathematical Reasoning: The model has been fine-tuned on the OpenR1-Math-220k dataset, making it proficient in handling mathematical problems and logical deductions.
  • GRPO Training: It utilizes the GRPO (Gradient-based Reward Policy Optimization) method, as introduced in the DeepSeekMath paper, to push the limits of mathematical reasoning in open language models.
  • Context Length: Features a substantial context window of 32768 tokens, allowing for processing longer and more complex mathematical problems or discussions.

Training Details

The model was trained using the TRL library, a framework for Transformer Reinforcement Learning. The training procedure leveraged specific versions of TRL (0.18.0), Transformers (4.52.3), PyTorch (2.6.0), Datasets (4.1.1), and Tokenizers (0.21.4).

Good for

  • Applications requiring strong mathematical problem-solving.
  • Research and development in enhancing AI's mathematical reasoning abilities.
  • Tasks that benefit from a model specifically optimized for numerical and logical challenges.