WSX/Qwen2.5-1.5B-Open-R1-GRPO-FC
WSX/Qwen2.5-1.5B-Open-R1-GRPO-FC is a 1.5 billion parameter language model fine-tuned by WSX. It is based on the Qwen2.5 architecture and was trained using the GRPO method on the AI-MO/NuminaMath-TIR dataset. This model is specifically optimized for mathematical reasoning tasks, leveraging techniques from DeepSeekMath. Its primary strength lies in enhancing mathematical problem-solving capabilities within a compact parameter count.
Loading preview...
Model Overview
WSX/Qwen2.5-1.5B-Open-R1-GRPO-FC is a 1.5 billion parameter language model developed by WSX. It is a fine-tuned variant of the Qwen2.5 architecture, specifically optimized for mathematical reasoning. The model's training incorporated the GRPO (Gradient Regularized Policy Optimization) method, a technique introduced in the DeepSeekMath paper, which aims to push the limits of mathematical reasoning in open language models.
Key Capabilities
- Enhanced Mathematical Reasoning: Fine-tuned on the AI-MO/NuminaMath-TIR dataset, this model is designed to excel at complex mathematical problems and logical deduction.
- GRPO Training: Utilizes the GRPO method, known for improving mathematical problem-solving performance.
- Compact Size: At 1.5 billion parameters, it offers mathematical reasoning capabilities in a relatively small footprint.
Good For
- Applications requiring strong mathematical problem-solving.
- Research and development in improving LLM performance on quantitative tasks.
- Scenarios where a smaller, specialized model for mathematical reasoning is preferred over larger, general-purpose models.