zhaohq/PureRL-1.5B-v12A-lam002

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:May 19, 2026Architecture:Transformer Warm

zhaohq/PureRL-1.5B-v12A-lam002 is a 1.5 billion parameter language model developed by zhaohq, fine-tuned from Qwen/Qwen2.5-Math-1.5B. It utilizes the GRPO training method, as introduced in the DeepSeekMath paper, to enhance its capabilities. With a context length of 32768 tokens, this model is primarily optimized for mathematical reasoning and complex problem-solving tasks.

Loading preview...

PureRL-1.5B-v12A-lam002 Overview

This model, developed by zhaohq, is a 1.5 billion parameter language model fine-tuned from the Qwen/Qwen2.5-Math-1.5B base. It leverages the GRPO (Generalized Reinforcement Learning with Policy Optimization) training method, a technique highlighted in the DeepSeekMath paper, to improve its performance. The model supports a substantial context length of 32768 tokens, making it suitable for processing longer inputs.

Key Capabilities

  • Enhanced Mathematical Reasoning: Benefits from GRPO training, a method designed to push the limits of mathematical reasoning in open language models.
  • Long Context Understanding: Capable of handling inputs up to 32768 tokens, useful for complex problems requiring extensive context.
  • Fine-tuned from Qwen2.5-Math-1.5B: Builds upon a strong mathematical foundation.

Good for

  • Mathematical Problem Solving: Ideal for tasks requiring advanced mathematical reasoning and computation.
  • Research and Development: Useful for exploring and applying reinforcement learning techniques in language model fine-tuning.
  • Complex Query Handling: Its long context window makes it suitable for detailed questions or scenarios.