Name: kangdawei/DRA-DR_GRPO API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: kangdawei

Model Overview

kangdawei/DRA-DR_GRPO is a 1.5 billion parameter language model derived from deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B. It has been fine-tuned using the TRL library on the knoveleng/open-rs dataset.

Key Training Methodology

The model's distinctiveness stems from its training with GRPO (Gradient Regularized Policy Optimization). This method, introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300), is designed to significantly enhance a model's mathematical reasoning abilities.

Capabilities and Use Cases

Enhanced Reasoning: The GRPO training suggests improved performance on tasks requiring logical deduction and problem-solving, particularly in mathematical domains.
Contextual Understanding: With a 32768 token context length, the model can process and generate responses based on extensive input, beneficial for complex queries.
Fine-tuned for Specific Data: Its training on the knoveleng/open-rs dataset implies potential strengths in areas related to that dataset's content.

This model is well-suited for applications where robust reasoning, especially mathematical or logical, is critical, and where processing long contexts is necessary.

Overview

Model Overview

Key Training Methodology

Capabilities and Use Cases

Full Model Card (README)