Thomas-Chou/Qwen2.5-1.5B-Open-R1-GRPO

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Feb 10, 2025Architecture:Transformer Warm

Thomas-Chou/Qwen2.5-1.5B-Open-R1-GRPO is a 1.5 billion parameter Qwen2.5-Instruct model fine-tuned by Thomas-Chou. It specializes in mathematical reasoning, having been trained on the OpenR1-Math-220k dataset using the GRPO method. This model is optimized for tasks requiring strong mathematical problem-solving capabilities, leveraging its 131072 token context length.

Loading preview...

Model Overview

Thomas-Chou/Qwen2.5-1.5B-Open-R1-GRPO is a 1.5 billion parameter language model derived from the Qwen/Qwen2.5-1.5B-Instruct architecture. Its primary distinction lies in its specialized fine-tuning for mathematical reasoning tasks.

Key Capabilities

  • Enhanced Mathematical Reasoning: The model has been fine-tuned on the OpenR1-Math-220k dataset, specifically targeting mathematical problem-solving.
  • GRPO Training Method: It utilizes the GRPO (Gradient Regularized Policy Optimization) training method, as detailed in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300), to improve its mathematical capabilities.
  • Large Context Window: Inherits a substantial context length of 131072 tokens, beneficial for complex multi-step reasoning.

Good For

  • Mathematical Problem Solving: Ideal for applications requiring robust mathematical reasoning, calculations, and understanding of mathematical concepts.
  • Research and Development: Useful for researchers exploring advanced fine-tuning techniques like GRPO for domain-specific performance enhancement.

This model was developed using the TRL framework (version 0.18.0) and is a focused adaptation of the base Qwen2.5-1.5B-Instruct model.