kangdawei/DRA-GRPO-8B

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Nov 23, 2025Architecture:Transformer Cold

The kangdawei/DRA-GRPO-8B model is an 8 billion parameter language model fine-tuned from deepseek-ai/DeepSeek-R1-Distill-Llama-8B. It was trained using the GRPO method on the knoveleng/open-rs dataset, specializing in mathematical reasoning tasks. With a 32768 token context length, this model is optimized for complex problem-solving and advanced reasoning capabilities.

Loading preview...

Model Overview

The kangdawei/DRA-GRPO-8B is an 8 billion parameter language model derived from the deepseek-ai/DeepSeek-R1-Distill-Llama-8B architecture. It has been specifically fine-tuned using the GRPO (Gradient-based Reward Policy Optimization) method, as introduced in the DeepSeekMath paper, to enhance its mathematical reasoning capabilities.

Key Capabilities

  • Mathematical Reasoning: Optimized for complex mathematical problem-solving through the GRPO training approach.
  • Large Context Window: Supports a substantial context length of 32768 tokens, enabling processing of extensive inputs for reasoning tasks.
  • Fine-tuned Performance: Leverages the knoveleng/open-rs dataset for specialized training, improving performance in its target domain.

Training Details

The model's training procedure utilized the TRL framework (version 0.16.0.dev0) and incorporated the GRPO method, which is designed to push the limits of mathematical reasoning in open language models. This makes DRA-GRPO-8B a strong candidate for applications requiring advanced analytical and problem-solving skills.