Name: kangdawei/DRA-GRPO-7B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: kangdawei

Model Overview

kangdawei/DRA-GRPO-7B is a 7.6 billion parameter language model, fine-tuned from the deepseek-ai/DeepSeek-R1-Distill-Qwen-7B base model. It was trained using the GRPO (Gradient Regularized Policy Optimization) method, a technique highlighted in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models." This suggests an emphasis on robust and structured reasoning capabilities, even though the fine-tuning dataset, knoveleng/open-rs, is not explicitly described as mathematical.

Key Capabilities

Fine-tuned Performance: Built upon a strong base model and further optimized with the GRPO method.
GRPO Training: Utilizes a sophisticated training procedure known for enhancing reasoning, as detailed in the DeepSeekMath paper.
General Text Generation: Capable of generating human-like text based on prompts, as demonstrated by the quick start example.

Good For

Developers looking for a 7B parameter model with enhanced reasoning potential due to its GRPO training.
Applications requiring general text generation, especially those that might benefit from a model fine-tuned on the knoveleng/open-rs dataset.
Experimentation with models trained using advanced policy optimization techniques like GRPO.

Overview

Model Overview

Key Capabilities

Good For

Full Model Card (README)