zhaohq/PureRL-1.5B-v7-s2-margin-maskon-afew

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:May 20, 2026Architecture:Transformer Warm

The zhaohq/PureRL-1.5B-v7-s2-margin-maskon-afew model is a 1.5 billion parameter language model, fine-tuned from zhaohq/PureRL-1.5B-v7-stage1-A-fewshot. It was trained using the TRL framework and incorporates GRPO, a method designed to enhance mathematical reasoning capabilities. This model is optimized for tasks requiring advanced mathematical problem-solving and reasoning, supporting a context length of 32768 tokens.

Loading preview...

Overview

The zhaohq/PureRL-1.5B-v7-s2-margin-maskon-afew is a 1.5 billion parameter language model, building upon the zhaohq/PureRL-1.5B-v7-stage1-A-fewshot base. This model has been specifically fine-tuned using the TRL framework, a library for Transformer Reinforcement Learning.

Key Differentiator: GRPO Training

A significant aspect of this model's development is its training with GRPO (Generalized Reinforcement Learning with Policy Optimization). This method, detailed in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models", is designed to enhance mathematical reasoning abilities in language models. This suggests the model is particularly adept at handling complex mathematical problems and logical deductions.

Technical Specifications

  • Parameter Count: 1.5 Billion
  • Context Length: 32768 tokens
  • Training Frameworks: TRL (0.16.0.dev0), Transformers (4.48.3), Pytorch (2.5.1+cu124), Datasets (4.0.0), Tokenizers (0.21.1)

Use Cases

Given its specialized training with GRPO, this model is well-suited for applications requiring:

  • Mathematical Reasoning: Solving complex math problems, generating mathematical explanations, or assisting in scientific computations.
  • Logical Deduction: Tasks that benefit from structured reasoning and problem-solving.
  • Advanced NLP: Scenarios where a strong understanding of numerical and logical relationships is crucial.