yuerxin/DeepSeek-R1-Distill-Qwen-1.5B-GRPO
Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Sep 20, 2025Architecture:Transformer Warm

Sayram/DeepSeek-R1-Distill-Qwen-1.5B-GRPO is a 1.5 billion parameter language model fine-tuned from deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B. It specializes in mathematical reasoning, having been trained with the GRPO method on the OpenR1-Math-220k dataset. This model is optimized for complex mathematical problem-solving and quantitative tasks, leveraging its specialized training for enhanced accuracy in this domain. Its 131072 token context length supports processing extensive mathematical problems and related textual information.

Loading preview...

Model Overview

Sayram/DeepSeek-R1-Distill-Qwen-1.5B-GRPO is a 1.5 billion parameter language model derived from the deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B base model. Its primary distinction lies in its specialized training for mathematical reasoning, utilizing the open-r1/OpenR1-Math-220k dataset.

Key Capabilities

  • Enhanced Mathematical Reasoning: The model was fine-tuned using the GRPO (Guided Reasoning Policy Optimization) method, as introduced in the DeepSeekMath paper. This method is designed to push the limits of mathematical problem-solving in language models.
  • Specialized Training: Its training on a dedicated mathematical dataset makes it particularly adept at understanding and generating responses for quantitative problems.
  • High Context Length: Features a substantial context window of 131072 tokens, allowing it to process and reason over lengthy mathematical descriptions and complex problem statements.

When to Use This Model

This model is particularly well-suited for applications requiring strong mathematical reasoning capabilities, such as:

  • Solving complex math problems.
  • Generating explanations for mathematical concepts.
  • Assisting in educational tools focused on mathematics.
  • Any use case where robust quantitative analysis and logical deduction are paramount.