zhaohq/PureRL-1.5B-v7-s2-l2-kl-w1-b2

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:May 21, 2026Architecture:Transformer Warm

The zhaohq/PureRL-1.5B-v7-s2-l2-kl-w1-b2 model is a 1.5 billion parameter language model developed by zhaohq, fine-tuned using the TRL framework. It was trained with GRPO, a method detailed in the DeepSeekMath paper, suggesting an optimization for mathematical reasoning capabilities. This model is designed for text generation tasks, particularly those benefiting from improved reasoning. It supports a 32768 token context length.

Loading preview...

Model Overview

zhaohq/PureRL-1.5B-v7-s2-l2-kl-w1-b2 is a 1.5 billion parameter language model developed by zhaohq. It has been fine-tuned using the TRL (Transformer Reinforcement Learning) framework, specifically leveraging the GRPO method. GRPO, introduced in the "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" paper, indicates a focus on enhancing the model's reasoning abilities.

Key Characteristics

  • Parameter Count: 1.5 billion parameters.
  • Context Length: Supports a substantial context window of 32768 tokens.
  • Training Method: Utilizes GRPO, a method for improving mathematical reasoning in language models.
  • Frameworks: Trained with TRL, Transformers, Pytorch, Datasets, and Tokenizers.

Use Cases

This model is suitable for text generation tasks where improved reasoning, potentially in mathematical or logical contexts, is beneficial. Its training methodology suggests an advantage in handling complex prompts requiring structured thought processes. Developers can integrate it using the Hugging Face transformers pipeline for quick deployment.