Name: hdong0/Qwen3-8B-base-Open-R1-GRPO_dapo_acc_16384_nokl API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: hdong0

Model Overview

This model, hdong0/Qwen3-8B-base-Open-R1-GRPO_dapo_acc_16384_nokl, is an 8 billion parameter language model derived from the Qwen3-8B-Base architecture. It has been specifically fine-tuned to enhance its mathematical reasoning abilities.

Key Capabilities & Training

Mathematical Reasoning: The model's primary strength lies in mathematical problem-solving, achieved through fine-tuning on the open-r1/DAPO-Math-17k-Processed dataset.
GRPO Method: Training incorporated the GRPO (Gradient-based Reward Policy Optimization) method, as detailed in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This method is designed to improve mathematical reasoning performance.
Context Length: It supports a substantial context length of 32768 tokens, allowing for processing longer and more complex mathematical problems or discussions.
Framework: The fine-tuning process utilized the TRL library from Hugging Face.

Use Cases

This model is particularly well-suited for applications requiring robust mathematical reasoning, such as:

Solving complex math problems.
Assisting in educational contexts for mathematical explanations.
Developing tools for scientific computation and analysis.

Overview

Model Overview

Key Capabilities & Training

Use Cases

Full Model Card (README)