leonMW/DeepSeek-R1-Distill-Qwen-1.5B-GSPO-Basic

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Sep 1, 2025Architecture:Transformer Warm

The leonMW/DeepSeek-R1-Distill-Qwen-1.5B-GSPO-Basic model is a 1.5 billion parameter language model, fine-tuned from deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B with a 32768 token context length. It was trained using the GRPO method, introduced in the DeepSeekMath paper, to enhance mathematical reasoning capabilities. This model is optimized for tasks requiring robust reasoning, particularly in mathematical contexts, leveraging its specialized training approach.

Loading preview...

Model Overview

This model, leonMW/DeepSeek-R1-Distill-Qwen-1.5B-GSPO-Basic, is a 1.5 billion parameter language model derived from deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B. It features a substantial context length of 32768 tokens, making it suitable for processing longer inputs.

Key Training Details

The primary differentiator for this model is its training methodology. It was fine-tuned using GRPO (Generalized Reinforcement Learning from Policy Optimization), a method detailed in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This specialized training aims to improve the model's performance in complex reasoning tasks, particularly those involving mathematics.

Use Cases

Given its GRPO-based training, this model is particularly well-suited for:

  • Mathematical reasoning tasks: Benefiting from the DeepSeekMath-derived training approach.
  • Complex problem-solving: Where logical deduction and structured thinking are required.
  • Applications requiring deep understanding of context: Due to its large 32768 token context window.

Frameworks Used

The model's training leveraged several key frameworks, including TRL (version 0.23.1), Transformers (version 4.57.1), Pytorch (version 2.8.0), Datasets (version 4.4.1), and Tokenizers (version 0.22.1).