Lucien520/Qwen2.5-1.5B-Open-R1-GRPO
Lucien520/Qwen2.5-1.5B-Open-R1-GRPO is a 1.5 billion parameter language model fine-tuned using the GRPO method, which is designed to enhance mathematical reasoning capabilities. This model leverages the Qwen2.5 architecture and is specifically optimized for tasks requiring robust mathematical problem-solving. It is suitable for applications where strong numerical and logical reasoning are critical, building upon the DeepSeekMath research. The model has a context length of 131072 tokens.
Loading preview...
Model Overview
Lucien520/Qwen2.5-1.5B-Open-R1-GRPO is a 1.5 billion parameter language model based on the Qwen2.5 architecture. This model has been fine-tuned using the GRPO (Gradient-based Reward Policy Optimization) method, as introduced in the research paper DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models.
Key Characteristics
- Parameter Count: 1.5 billion parameters.
- Training Method: Utilizes GRPO, a technique aimed at improving mathematical reasoning.
- Frameworks: Trained with TRL (Transformer Reinforcement Learning) version 0.18.0, Transformers 4.52.3, Pytorch 2.6.0, Datasets 4.4.1, and Tokenizers 0.21.4.
- Context Length: Supports a substantial context length of 131072 tokens.
Intended Use Cases
This model is particularly well-suited for applications that demand strong mathematical and logical reasoning. Its fine-tuning with GRPO suggests an optimization for tasks such as:
- Solving mathematical problems.
- Generating logical explanations for numerical concepts.
- Assisting in scientific or engineering calculations where reasoning is paramount.