Lucien520/Qwen2.5-1.5B-Open-R1-GRPO

TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Dec 5, 2025Architecture:Transformer Cold

Lucien520/Qwen2.5-1.5B-Open-R1-GRPO is a 1.5 billion parameter language model fine-tuned using the GRPO method, which is designed to enhance mathematical reasoning capabilities. This model leverages the Qwen2.5 architecture and is specifically optimized for tasks requiring robust mathematical problem-solving. It is suitable for applications where strong numerical and logical reasoning are critical, building upon the DeepSeekMath research. The model has a context length of 131072 tokens.

Loading preview...

Model Overview

Lucien520/Qwen2.5-1.5B-Open-R1-GRPO is a 1.5 billion parameter language model based on the Qwen2.5 architecture. This model has been fine-tuned using the GRPO (Gradient-based Reward Policy Optimization) method, as introduced in the research paper DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models.

Key Characteristics

  • Parameter Count: 1.5 billion parameters.
  • Training Method: Utilizes GRPO, a technique aimed at improving mathematical reasoning.
  • Frameworks: Trained with TRL (Transformer Reinforcement Learning) version 0.18.0, Transformers 4.52.3, Pytorch 2.6.0, Datasets 4.4.1, and Tokenizers 0.21.4.
  • Context Length: Supports a substantial context length of 131072 tokens.

Intended Use Cases

This model is particularly well-suited for applications that demand strong mathematical and logical reasoning. Its fine-tuning with GRPO suggests an optimization for tasks such as:

  • Solving mathematical problems.
  • Generating logical explanations for numerical concepts.
  • Assisting in scientific or engineering calculations where reasoning is paramount.