zhaohq/PureRL-1.5B-v6d1-baseline-acc10

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:May 18, 2026Architecture:Transformer Warm

The zhaohq/PureRL-1.5B-v6d1-baseline-acc10 model is a 1.5 billion parameter language model fine-tuned from Qwen/Qwen2.5-Math-1.5B. It was trained using the TRL framework with GRPO, a method designed to enhance mathematical reasoning. This model is optimized for tasks requiring advanced mathematical problem-solving and logical deduction, leveraging its 32768 token context length. Its specialized training makes it particularly suitable for applications in scientific computing and quantitative analysis.

Loading preview...

Model Overview

zhaohq/PureRL-1.5B-v6d1-baseline-acc10 is a 1.5 billion parameter language model, fine-tuned from the Qwen/Qwen2.5-Math-1.5B base model. It leverages a substantial 32768 token context length, making it capable of processing extensive inputs for complex tasks.

Key Capabilities

  • Enhanced Mathematical Reasoning: This model was specifically trained using GRPO (Generalized Reinforcement Learning for Policy Optimization), a method introduced in the DeepSeekMath paper, to push the limits of mathematical reasoning.
  • Fine-tuned with TRL: The model's training utilized the TRL (Transformer Reinforcement Learning) framework, indicating a focus on optimizing its performance through reinforcement learning techniques.
  • Qwen2.5-Math Base: Built upon the Qwen2.5-Math-1.5B architecture, it inherits a strong foundation for numerical and logical tasks.

Training Details

The training procedure for this model involved GRPO, as detailed in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This method is designed to improve the model's ability to handle complex mathematical problems. The training process was tracked and can be visualized via Weights & Biases.

Good For

  • Mathematical Problem Solving: Ideal for applications requiring robust mathematical reasoning and computation.
  • Scientific and Quantitative Analysis: Suitable for tasks in fields that demand precise numerical understanding and logical deduction.