Model Overview
This model, leonMW/DeepSeek-R1-Distill-Qwen-1.5B-GSPO-Basic, is a 1.5 billion parameter language model derived from deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B. It features a substantial context length of 32768 tokens, making it suitable for processing longer inputs.
Key Training Details
The primary differentiator for this model is its training methodology. It was fine-tuned using GRPO (Generalized Reinforcement Learning from Policy Optimization), a method detailed in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This specialized training aims to improve the model's performance in complex reasoning tasks, particularly those involving mathematics.
Use Cases
Given its GRPO-based training, this model is particularly well-suited for:
- Mathematical reasoning tasks: Benefiting from the DeepSeekMath-derived training approach.
- Complex problem-solving: Where logical deduction and structured thinking are required.
- Applications requiring deep understanding of context: Due to its large 32768 token context window.
Frameworks Used
The model's training leveraged several key frameworks, including TRL (version 0.23.1), Transformers (version 4.57.1), Pytorch (version 2.8.0), Datasets (version 4.4.1), and Tokenizers (version 0.22.1).