zhaohq/PureRL-7B-v5-07-brierG

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:May 15, 2026Architecture:Transformer Warm

zhaohq/PureRL-7B-v5-07-brierG is a 7.6 billion parameter language model fine-tuned from Qwen/Qwen2.5-Math-7B by zhaohq. It utilizes the GRPO training method, as introduced in the DeepSeekMath paper, to enhance its mathematical reasoning capabilities. With a context length of 32768 tokens, this model is primarily optimized for complex mathematical problem-solving and advanced reasoning tasks.

Loading preview...

PureRL-7B-v5-07-brierG: Enhanced Mathematical Reasoning

This model, developed by zhaohq, is a 7.6 billion parameter language model fine-tuned from the Qwen/Qwen2.5-Math-7B base model. It leverages the GRPO (Gradient Regularized Policy Optimization) training method, a technique highlighted in the DeepSeekMath paper, to significantly improve its mathematical reasoning abilities. With a substantial context length of 32768 tokens, it is designed to handle intricate problems requiring deep analytical thought.

Key Capabilities

  • Advanced Mathematical Reasoning: Specialized training with GRPO enhances its performance on complex mathematical tasks.
  • Large Context Window: Supports inputs up to 32768 tokens, allowing for detailed problem descriptions and multi-step reasoning.
  • Fine-tuned from Qwen2.5-Math-7B: Builds upon a strong foundation already optimized for mathematical understanding.

Good for

  • Solving challenging mathematical problems.
  • Applications requiring robust logical and analytical reasoning.
  • Research and development in AI for mathematical domains.