cameronphchen/Qwen2.5-1.5B-Open-R1-GRPO
cameronphchen/Qwen2.5-1.5B-Open-R1-GRPO is a 1.5 billion parameter language model fine-tuned from Qwen/Qwen2.5-1.5B-Instruct, utilizing the GRPO training method. This model is optimized for enhanced reasoning capabilities, particularly in mathematical contexts, leveraging techniques introduced in the DeepSeekMath paper. With a context length of 32768 tokens, it is suitable for tasks requiring robust logical processing and extended conversational or analytical interactions.
Loading preview...
Overview
cameronphchen/Qwen2.5-1.5B-Open-R1-GRPO is a 1.5 billion parameter language model, fine-tuned from the base Qwen/Qwen2.5-1.5B-Instruct model. This model leverages the GRPO (Gradient-based Reward Policy Optimization) training method, which was introduced in the DeepSeekMath paper, indicating a focus on improving mathematical reasoning and general reasoning capabilities.
Key Capabilities
- Enhanced Reasoning: Benefits from the GRPO training method, which is designed to push the limits of mathematical reasoning in open language models.
- Instruction Following: Inherits instruction-following capabilities from its base model, Qwen2.5-1.5B-Instruct.
- Extended Context: Supports a context length of 32768 tokens, allowing for processing longer inputs and maintaining coherence over extended interactions.
Good for
- Mathematical Reasoning Tasks: Ideal for applications requiring strong logical and mathematical problem-solving.
- Complex Instruction Following: Suitable for scenarios where precise adherence to instructions is critical.
- Research and Experimentation: Provides a fine-tuned model for exploring the impact of GRPO on smaller, open-source architectures.