zhaohq/PureRL-1.5B-v7-s2-l2-kl-w1-b1

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:May 21, 2026Architecture:Transformer Warm

The zhaohq/PureRL-1.5B-v7-s2-l2-kl-w1-b1 model is a 1.5 billion parameter language model developed by zhaohq, fine-tuned from zhaohq/PureRL-1.5B-v7-stage1-reasoning. It was trained using the TRL framework and incorporates the GRPO method, which is designed to enhance mathematical reasoning in open language models. With a context length of 32768 tokens, this model is optimized for tasks requiring advanced reasoning capabilities, particularly in mathematical contexts.

Loading preview...

Model Overview

The zhaohq/PureRL-1.5B-v7-s2-l2-kl-w1-b1 is a 1.5 billion parameter language model developed by zhaohq. It is a fine-tuned iteration of the zhaohq/PureRL-1.5B-v7-stage1-reasoning base model, specifically enhanced for advanced reasoning tasks. The model leverages a substantial context length of 32768 tokens, allowing it to process and generate longer, more complex sequences.

Training Methodology

This model was trained using the TRL (Transformer Reinforcement Learning) framework. A key aspect of its training procedure is the implementation of GRPO (Generalized Reinforcement Learning with Policy Optimization), a method introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models". This indicates a strong focus on improving the model's ability to handle and solve mathematical reasoning problems.

Key Features

  • Parameter Count: 1.5 billion parameters.
  • Context Length: 32768 tokens.
  • Fine-tuned for Reasoning: Built upon a reasoning-focused base model.
  • GRPO Integration: Utilizes the GRPO method for enhanced mathematical reasoning capabilities.
  • TRL Framework: Developed using the TRL library for efficient reinforcement learning from human feedback or other reward signals.

Potential Use Cases

Given its training methodology and focus, this model is particularly well-suited for applications requiring:

  • Mathematical Problem Solving: Tasks involving complex mathematical reasoning and calculations.
  • Logical Deduction: Scenarios where structured logical thinking is required.
  • Advanced Question Answering: Answering questions that demand more than simple factual recall, especially those with a mathematical or logical component.