zhaohq/PureRL-1.5B-v6b3-bare-fmt03

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:May 17, 2026Architecture:Transformer Warm

The zhaohq/PureRL-1.5B-v6b3-bare-fmt03 model is a 1.5 billion parameter language model, fine-tuned from Qwen/Qwen2.5-Math-1.5B. Developed by zhaohq, it leverages the TRL framework and GRPO training method, as introduced in the DeepSeekMath paper. This model is specifically optimized for mathematical reasoning tasks, building upon its Qwen2.5-Math base to enhance performance in complex problem-solving within a 32768 token context length.

Loading preview...

Model Overview

The zhaohq/PureRL-1.5B-v6b3-bare-fmt03 is a 1.5 billion parameter language model, fine-tuned by zhaohq from the base model Qwen/Qwen2.5-Math-1.5B. It is designed to excel in mathematical reasoning tasks, inheriting and enhancing the capabilities of its mathematical-focused predecessor.

Key Training Details

This model was trained using the TRL (Transformer Reinforcement Learning) framework. A significant aspect of its training procedure is the application of GRPO, a method detailed in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This indicates a specialized approach to reinforcement learning from human feedback or similar techniques, aimed at improving mathematical problem-solving abilities.

Capabilities and Use Cases

Given its foundation and specialized training, this model is particularly suited for:

  • Mathematical Reasoning: Solving complex mathematical problems and generating logical steps.
  • Instruction Following: Responding to user prompts in a structured and coherent manner, especially for analytical questions.
  • Research and Development: Serving as a base for further experimentation in reinforcement learning for mathematical domains.

With a context length of 32768 tokens, it can process and generate relatively long and detailed responses, which is beneficial for multi-step mathematical derivations or explanations.