kangdawei/DRA-GRPO-7B

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Nov 20, 2025Architecture:Transformer Cold

The kangdawei/DRA-GRPO-7B model is a 7.6 billion parameter language model fine-tuned from deepseek-ai/DeepSeek-R1-Distill-Qwen-7B. It leverages the GRPO training method, originally introduced for mathematical reasoning, and is specifically fine-tuned on the knoveleng/open-rs dataset. This model is designed for general text generation tasks, potentially excelling in areas related to its training data.

Loading preview...

Model Overview

kangdawei/DRA-GRPO-7B is a 7.6 billion parameter language model, fine-tuned from the deepseek-ai/DeepSeek-R1-Distill-Qwen-7B base model. It was trained using the GRPO (Gradient Regularized Policy Optimization) method, a technique highlighted in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models." This suggests an emphasis on robust and structured reasoning capabilities, even though the fine-tuning dataset, knoveleng/open-rs, is not explicitly described as mathematical.

Key Capabilities

  • Fine-tuned Performance: Built upon a strong base model and further optimized with the GRPO method.
  • GRPO Training: Utilizes a sophisticated training procedure known for enhancing reasoning, as detailed in the DeepSeekMath paper.
  • General Text Generation: Capable of generating human-like text based on prompts, as demonstrated by the quick start example.

Good For

  • Developers looking for a 7B parameter model with enhanced reasoning potential due to its GRPO training.
  • Applications requiring general text generation, especially those that might benefit from a model fine-tuned on the knoveleng/open-rs dataset.
  • Experimentation with models trained using advanced policy optimization techniques like GRPO.