shawntzx/Qwen2.5-3B-GRPO-3_13_math

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:3.1BQuant:BF16Ctx Length:32kPublished:Mar 13, 2025Architecture:Transformer Warm

shawntzx/Qwen2.5-3B-GRPO-3_13_math is a 3.1 billion parameter language model fine-tuned from Qwen/Qwen2.5-3B-Instruct. It utilizes the GRPO (Gradient-based Reward Policy Optimization) method, as introduced in the DeepSeekMath paper, to enhance its mathematical reasoning capabilities. This model is specifically optimized for complex mathematical problem-solving and logical deduction tasks, making it suitable for applications requiring advanced numerical and symbolic reasoning.

Loading preview...

Overview

shawntzx/Qwen2.5-3B-GRPO-3_13_math is a 3.1 billion parameter language model, fine-tuned from the base Qwen/Qwen2.5-3B-Instruct model. This model leverages the Gradient-based Reward Policy Optimization (GRPO) method, a technique detailed in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models." The fine-tuning process was conducted using the TRL framework.

Key Capabilities

  • Enhanced Mathematical Reasoning: Specifically optimized for handling complex mathematical problems and logical deductions.
  • GRPO Training: Benefits from a training methodology designed to improve performance in mathematical contexts.
  • Instruction-Following: Inherits strong instruction-following capabilities from its base Qwen2.5-3B-Instruct model.

Good For

  • Applications requiring robust mathematical problem-solving.
  • Tasks involving symbolic reasoning and numerical analysis.
  • Research and development in advanced AI for mathematics.