zhaohq/PureRL-1.5B-v7-s2-corr-maskoff

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:May 20, 2026Architecture:Transformer Warm

The zhaohq/PureRL-1.5B-v7-s2-corr-maskoff is a 1.5 billion parameter language model with a 32768 token context length, fine-tuned using the TRL framework. This model was trained with GRPO, a method specifically designed to enhance mathematical reasoning capabilities. It is suitable for tasks requiring advanced mathematical problem-solving and logical deduction.

Loading preview...

Model Overview

The zhaohq/PureRL-1.5B-v7-s2-corr-maskoff is a 1.5 billion parameter language model, fine-tuned using the TRL (Transformer Reinforcement Learning) framework. It leverages a substantial context length of 32768 tokens, making it capable of processing extensive inputs.

Key Differentiator: GRPO Training

A core aspect of this model is its training methodology. It was specifically trained using GRPO (Gradient-based Reinforcement Learning with Policy Optimization), a method introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This training approach is designed to significantly enhance the model's capabilities in mathematical reasoning.

Capabilities

  • Enhanced Mathematical Reasoning: Optimized for complex mathematical problem-solving due to its GRPO training.
  • Long Context Understanding: Benefits from a 32768 token context window, allowing for detailed analysis of longer prompts and documents.
  • TRL Framework: Built upon the TRL framework, indicating a reinforcement learning approach to fine-tuning.

Recommended Use Cases

This model is particularly well-suited for applications requiring:

  • Solving mathematical problems and equations.
  • Logical deduction and reasoning tasks.
  • Processing and generating text where mathematical understanding is crucial.

Training Environment

The model was developed using specific versions of key frameworks:

  • TRL: 0.16.0.dev0
  • Transformers: 4.48.3
  • Pytorch: 2.5.1
  • Datasets: 4.0.0
  • Tokenizers: 0.21.1