sunmengjie/DeepSeek-R1-Distill-Qwen-1.5B-GRPO
Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Oct 9, 2025Architecture:Transformer Warm

The sunmengjie/DeepSeek-R1-Distill-Qwen-1.5B-GRPO is a 1.5 billion parameter language model, fine-tuned from deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B. This model leverages the GRPO (Generalized Reinforcement Learning from Policy Optimization) method, as introduced in the DeepSeekMath paper, to enhance its capabilities. It is specifically optimized for mathematical reasoning tasks, building upon its base model's foundation. The model is designed for applications requiring robust mathematical problem-solving and reasoning.

Loading preview...

Model Overview

This model, sunmengjie/DeepSeek-R1-Distill-Qwen-1.5B-GRPO, is a 1.5 billion parameter language model fine-tuned from the deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B base model. It was trained using the TRL library and incorporates the GRPO (Generalized Reinforcement Learning from Policy Optimization) method.

Key Capabilities & Training

  • Mathematical Reasoning: The primary differentiator of this model is its fine-tuning with GRPO, a method detailed in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This suggests an enhanced focus on and proficiency in mathematical problem-solving.
  • Efficient Fine-tuning: Built upon a 1.5B parameter model, it offers a relatively compact size while aiming for specialized performance.
  • Frameworks Used: The training utilized TRL (version 0.23.0), Transformers (version 4.57.0), PyTorch (version 2.6.0+cu124), Datasets (version 4.1.1), and Tokenizers (version 0.22.1).

Use Cases

This model is particularly well-suited for applications requiring:

  • Mathematical problem-solving: Leveraging its GRPO-enhanced training for complex calculations and logical reasoning in mathematical contexts.
  • Research and development: As a base for further experimentation or fine-tuning on specific mathematical or reasoning-intensive datasets.
  • Educational tools: Potentially assisting in generating explanations or solutions for mathematical queries.