WSX/Qwen2.5-1.5B-Open-R1-GRPO-FC

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Feb 17, 2025Architecture:Transformer Warm

WSX/Qwen2.5-1.5B-Open-R1-GRPO-FC is a 1.5 billion parameter language model fine-tuned by WSX. It is based on the Qwen2.5 architecture and was trained using the GRPO method on the AI-MO/NuminaMath-TIR dataset. This model is specifically optimized for mathematical reasoning tasks, leveraging techniques from DeepSeekMath. Its primary strength lies in enhancing mathematical problem-solving capabilities within a compact parameter count.

Loading preview...

Model Overview

WSX/Qwen2.5-1.5B-Open-R1-GRPO-FC is a 1.5 billion parameter language model developed by WSX. It is a fine-tuned variant of the Qwen2.5 architecture, specifically optimized for mathematical reasoning. The model's training incorporated the GRPO (Gradient Regularized Policy Optimization) method, a technique introduced in the DeepSeekMath paper, which aims to push the limits of mathematical reasoning in open language models.

Key Capabilities

  • Enhanced Mathematical Reasoning: Fine-tuned on the AI-MO/NuminaMath-TIR dataset, this model is designed to excel at complex mathematical problems and logical deduction.
  • GRPO Training: Utilizes the GRPO method, known for improving mathematical problem-solving performance.
  • Compact Size: At 1.5 billion parameters, it offers mathematical reasoning capabilities in a relatively small footprint.

Good For

  • Applications requiring strong mathematical problem-solving.
  • Research and development in improving LLM performance on quantitative tasks.
  • Scenarios where a smaller, specialized model for mathematical reasoning is preferred over larger, general-purpose models.