Lansechen/Qwen2.5-7B-Open-R1-GRPO-math-lighteval-1epochstop-withformat

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Apr 1, 2025Architecture:Transformer Warm

Lansechen/Qwen2.5-7B-Open-R1-GRPO-math-lighteval-1epochstop-withformat is a 7.6 billion parameter language model fine-tuned from Qwen/Qwen2.5-7B. This model is specifically optimized for mathematical reasoning tasks, leveraging the GRPO training method introduced in the DeepSeekMath paper. It is designed to enhance performance in complex mathematical problem-solving, making it suitable for applications requiring advanced numerical and logical deduction.

Loading preview...

Overview

This model, Lansechen/Qwen2.5-7B-Open-R1-GRPO-math-lighteval-1epochstop-withformat, is a 7.6 billion parameter language model derived from the Qwen/Qwen2.5-7B base. It has been fine-tuned using the TRL framework, with a particular focus on improving mathematical reasoning capabilities.

Key Capabilities

  • Enhanced Mathematical Reasoning: Specifically trained with the GRPO (Gradient-based Reward Policy Optimization) method, as detailed in the DeepSeekMath paper, to excel in complex mathematical problem-solving.
  • Qwen2.5-7B Foundation: Benefits from the robust architecture and general language understanding of the Qwen2.5-7B base model.

Good For

  • Applications requiring strong mathematical reasoning and problem-solving.
  • Tasks involving numerical analysis, logical deduction, and scientific computation.
  • Developers looking for a specialized model to handle math-intensive queries and generate accurate mathematical responses.