zhaohq/PureRL-1.5B-v13C-lam010

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:May 19, 2026Architecture:Transformer Warm

PureRL-1.5B-v13C-lam010 is a 1.5 billion parameter language model developed by zhaohq, fine-tuned from Qwen/Qwen2.5-Math-1.5B. This model was trained using the TRL library and the GRPO method, which is designed to enhance mathematical reasoning capabilities. It is optimized for tasks requiring advanced mathematical problem-solving and logical deduction, building upon its Qwen2.5-Math base. The model supports a context length of 32768 tokens, making it suitable for complex reasoning tasks.

Loading preview...

Model Overview

zhaohq/PureRL-1.5B-v13C-lam010 is a 1.5 billion parameter language model, fine-tuned from the Qwen/Qwen2.5-Math-1.5B base model. It leverages the TRL (Transformer Reinforcement Learning) library for its training process.

Key Differentiator: GRPO Training

A significant aspect of this model is its training methodology. It was trained using GRPO (Generalized Reinforcement Learning with Policy Optimization), a method introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This indicates a strong focus on improving the model's mathematical reasoning and problem-solving abilities.

Capabilities & Use Cases

  • Enhanced Mathematical Reasoning: Due to its GRPO training on a math-focused base model, PureRL-1.5B-v13C-lam010 is particularly well-suited for tasks that require complex mathematical understanding and logical deduction.
  • Instruction Following: As a fine-tuned model, it is designed to follow instructions effectively, making it suitable for various generative AI applications.
  • Long Context Processing: With a context length of 32768 tokens, it can handle and process extensive inputs, which is beneficial for multi-step reasoning problems or detailed queries.

When to Use This Model

Consider using PureRL-1.5B-v13C-lam010 if your application involves:

  • Solving mathematical problems or equations.
  • Generating logical explanations or proofs.
  • Tasks requiring robust reasoning capabilities, especially in quantitative domains.
  • Applications where a smaller, efficient model with specialized mathematical prowess is preferred over larger, general-purpose models.