kangdawei/DRA-DR_GRPO
kangdawei/DRA-DR_GRPO is a 1.5 billion parameter language model fine-tuned from deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B, utilizing the GRPO training method. Trained on the knoveleng/open-rs dataset, this model is optimized for enhanced reasoning capabilities, particularly in mathematical contexts. It features a 32768 token context length, making it suitable for tasks requiring deep contextual understanding and complex problem-solving.
Loading preview...
Model Overview
kangdawei/DRA-DR_GRPO is a 1.5 billion parameter language model derived from deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B. It has been fine-tuned using the TRL library on the knoveleng/open-rs dataset.
Key Training Methodology
The model's distinctiveness stems from its training with GRPO (Gradient Regularized Policy Optimization). This method, introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300), is designed to significantly enhance a model's mathematical reasoning abilities.
Capabilities and Use Cases
- Enhanced Reasoning: The GRPO training suggests improved performance on tasks requiring logical deduction and problem-solving, particularly in mathematical domains.
- Contextual Understanding: With a 32768 token context length, the model can process and generate responses based on extensive input, beneficial for complex queries.
- Fine-tuned for Specific Data: Its training on the
knoveleng/open-rsdataset implies potential strengths in areas related to that dataset's content.
This model is well-suited for applications where robust reasoning, especially mathematical or logical, is critical, and where processing long contexts is necessary.