Name: xiwenc1/OpenRS-DR_GRPO_dra-qwen2 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: xiwenc1

Model Overview

xiwenc1/OpenRS-DR_GRPO_dra-qwen2 is a 3.1 billion parameter language model, fine-tuned from the Qwen2.5-3B-Instruct base model. This model leverages the GRPO (Gradient-based Reward Policy Optimization) method, as introduced in the "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" paper, to enhance its reasoning capabilities.

Key Capabilities

Mathematical Reasoning: Specialized through GRPO training, making it suitable for tasks requiring logical and mathematical problem-solving.
Instruction Following: Inherits strong instruction-following abilities from its Qwen2.5-3B-Instruct base.
Extended Context: Supports a substantial context length of 32768 tokens, allowing for processing longer inputs and more complex problem descriptions.

Training Details

The model was fine-tuned on the knoveleng/open-rs dataset using the TRL (Transformer Reinforcement Learning) framework. The application of the GRPO method specifically targets improvements in mathematical reasoning, differentiating it from general-purpose instruction-tuned models.

Good For

Applications requiring advanced mathematical problem-solving.
Tasks where robust reasoning and logical deduction are critical.
Developers looking for a compact yet capable model for specialized reasoning tasks.

Overview

Model Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)