zhaohq/PureRL-1.5B-v7-s2-l1-maskon-fixed

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:May 20, 2026Architecture:Transformer Warm

The zhaohq/PureRL-1.5B-v7-s2-l1-maskon-fixed model is a 1.5 billion parameter language model fine-tuned by zhaohq. It was trained using the TRL framework and incorporates the GRPO method, which is designed to enhance mathematical reasoning capabilities. This model is specifically optimized for tasks requiring advanced mathematical problem-solving, leveraging techniques from the DeepSeekMath research. With a context length of 32768 tokens, it is suitable for applications demanding robust mathematical understanding and generation.

Loading preview...

Model Overview

The zhaohq/PureRL-1.5B-v7-s2-l1-maskon-fixed is a 1.5 billion parameter language model developed by zhaohq. It is a fine-tuned model, built upon an unspecified base, and leverages the TRL (Transformer Reinforcement Learning) framework for its training.

Key Differentiator: GRPO Training

A significant aspect of this model is its training methodology, which incorporates GRPO (Generalized Reinforcement Learning with Policy Optimization). This method was introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). The integration of GRPO suggests a strong focus on improving the model's capabilities in mathematical reasoning tasks.

Technical Details

  • Parameters: 1.5 Billion
  • Context Length: 32768 tokens
  • Training Frameworks: TRL (version 0.16.0.dev0), Transformers (version 4.48.3), PyTorch (version 2.5.1).

Intended Use Cases

Given its specialized training with GRPO, this model is particularly well-suited for applications that require:

  • Mathematical problem-solving: Excelling in tasks that demand logical and mathematical reasoning.
  • Complex numerical analysis: Handling intricate calculations and quantitative queries.
  • Research and development: As a base for further fine-tuning on specific mathematical or scientific domains.