Name: movefast/Qwen2.5-7B-Open-R1-GRPO API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: movefast

Overview

movefast/Qwen2.5-7B-Open-R1-GRPO is a 7.6 billion parameter language model, fine-tuned from the Qwen/Qwen2.5-7B-Instruct base model. It leverages the Qwen2.5 architecture, known for its strong general-purpose capabilities, and extends it with specialized training.

Key Capabilities

Enhanced Mathematical Reasoning: The primary differentiator of this model is its training with GRPO (Gradient-based Reward Policy Optimization), a method introduced in the DeepSeekMath paper. This technique is specifically designed to push the limits of mathematical reasoning in open language models.
Instruction Following: As a fine-tuned version of an instruct model, it is adept at following user instructions and generating relevant responses.
Large Context Window: With a context length of 32768 tokens, the model can process and understand long-form inputs, which is beneficial for complex problem-solving and detailed conversations.

Training Details

The model was trained using the TRL (Transformer Reinforcement Learning) framework. The application of GRPO suggests a focus on improving performance in areas where precise, step-by-step reasoning is crucial, such as mathematics and logic. This training approach aims to refine the model's ability to generate accurate and coherent solutions to challenging problems.

Good For

Mathematical Problem Solving: Ideal for applications requiring advanced mathematical reasoning, calculations, and logical deduction.
Complex Instruction Following: Suitable for tasks where detailed and multi-step instructions need to be accurately interpreted and executed.
Research and Development: Provides a strong base for further experimentation and fine-tuning on specific reasoning-intensive tasks.

Overview

Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)