xiaoni611/qwen-2.5-3b-r1-countdown
The xiaoni611/qwen-2.5-3b-r1-countdown model is a 3.1 billion parameter language model fine-tuned from Qwen/Qwen2.5-3B. It was trained using the TRL framework and incorporates the GRPO method, which is designed to enhance mathematical reasoning capabilities. This model is optimized for tasks requiring advanced reasoning, particularly in mathematical contexts, and supports a 32768 token context length.
Loading preview...
Model Overview
The xiaoni611/qwen-2.5-3b-r1-countdown is a 3.1 billion parameter language model, fine-tuned from the base Qwen/Qwen2.5-3B architecture. It leverages the TRL (Transformer Reinforcement Learning) framework for its training process.
Key Differentiator: GRPO Training
A significant aspect of this model is its training methodology, which incorporates GRPO (Gradient-based Reinforcement Learning with Policy Optimization). This method, introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300), is specifically designed to improve a model's capabilities in mathematical reasoning tasks.
Technical Specifications
- Base Model: Qwen/Qwen2.5-3B
- Parameter Count: 3.1 Billion
- Context Length: 32768 tokens
- Training Frameworks: TRL (version 0.14.0), Transformers (version 4.48.1), PyTorch (version 2.5.1)
Use Cases
This model is particularly well-suited for applications requiring:
- Mathematical Problem Solving: Due to its GRPO-enhanced training, it is expected to perform strongly in tasks involving complex mathematical reasoning.
- General Language Understanding and Generation: As a fine-tuned Qwen2.5-3B variant, it retains strong capabilities in various natural language processing tasks.
Developers can quickly get started using the provided transformers pipeline example for text generation.