kangdawei/DRA-DR_GRPO

TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Sep 29, 2025Architecture:Transformer Cold

kangdawei/DRA-DR_GRPO is a 1.5 billion parameter language model fine-tuned from deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B, utilizing the GRPO training method. Trained on the knoveleng/open-rs dataset, this model is optimized for enhanced reasoning capabilities, particularly in mathematical contexts. It features a 32768 token context length, making it suitable for tasks requiring deep contextual understanding and complex problem-solving.

Loading preview...

Model Overview

kangdawei/DRA-DR_GRPO is a 1.5 billion parameter language model derived from deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B. It has been fine-tuned using the TRL library on the knoveleng/open-rs dataset.

Key Training Methodology

The model's distinctiveness stems from its training with GRPO (Gradient Regularized Policy Optimization). This method, introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300), is designed to significantly enhance a model's mathematical reasoning abilities.

Capabilities and Use Cases

  • Enhanced Reasoning: The GRPO training suggests improved performance on tasks requiring logical deduction and problem-solving, particularly in mathematical domains.
  • Contextual Understanding: With a 32768 token context length, the model can process and generate responses based on extensive input, beneficial for complex queries.
  • Fine-tuned for Specific Data: Its training on the knoveleng/open-rs dataset implies potential strengths in areas related to that dataset's content.

This model is well-suited for applications where robust reasoning, especially mathematical or logical, is critical, and where processing long contexts is necessary.