Name: kangdawei/DRA-GRPO-8B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: kangdawei

Model Overview

The kangdawei/DRA-GRPO-8B is an 8 billion parameter language model derived from the deepseek-ai/DeepSeek-R1-Distill-Llama-8B architecture. It has been specifically fine-tuned using the GRPO (Gradient-based Reward Policy Optimization) method, as introduced in the DeepSeekMath paper, to enhance its mathematical reasoning capabilities.

Key Capabilities

Mathematical Reasoning: Optimized for complex mathematical problem-solving through the GRPO training approach.
Large Context Window: Supports a substantial context length of 32768 tokens, enabling processing of extensive inputs for reasoning tasks.
Fine-tuned Performance: Leverages the knoveleng/open-rs dataset for specialized training, improving performance in its target domain.

Training Details

The model's training procedure utilized the TRL framework (version 0.16.0.dev0) and incorporated the GRPO method, which is designed to push the limits of mathematical reasoning in open language models. This makes DRA-GRPO-8B a strong candidate for applications requiring advanced analytical and problem-solving skills.

Overview

Model Overview

Key Capabilities

Training Details

Full Model Card (README)