hdong0/Qwen3-8B-base-Open-R1-GRPO_dapo_acc_16384_nokl

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Oct 7, 2025Architecture:Transformer Cold

hdong0/Qwen3-8B-base-Open-R1-GRPO_dapo_acc_16384_nokl is an 8 billion parameter language model, fine-tuned from the Qwen3-8B-Base architecture. Developed by hdong0, this model specializes in mathematical reasoning, having been trained with the GRPO method on the DAPO-Math-17k-Processed dataset. It is optimized for tasks requiring advanced mathematical problem-solving capabilities, leveraging a 32768 token context length.

Loading preview...

Model Overview

This model, hdong0/Qwen3-8B-base-Open-R1-GRPO_dapo_acc_16384_nokl, is an 8 billion parameter language model derived from the Qwen3-8B-Base architecture. It has been specifically fine-tuned to enhance its mathematical reasoning abilities.

Key Capabilities & Training

  • Mathematical Reasoning: The model's primary strength lies in mathematical problem-solving, achieved through fine-tuning on the open-r1/DAPO-Math-17k-Processed dataset.
  • GRPO Method: Training incorporated the GRPO (Gradient-based Reward Policy Optimization) method, as detailed in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This method is designed to improve mathematical reasoning performance.
  • Context Length: It supports a substantial context length of 32768 tokens, allowing for processing longer and more complex mathematical problems or discussions.
  • Framework: The fine-tuning process utilized the TRL library from Hugging Face.

Use Cases

This model is particularly well-suited for applications requiring robust mathematical reasoning, such as:

  • Solving complex math problems.
  • Assisting in educational contexts for mathematical explanations.
  • Developing tools for scientific computation and analysis.