Name: cameronphchen/Qwen2.5-1.5B-Open-R1-GRPO API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: cameronphchen

Overview

cameronphchen/Qwen2.5-1.5B-Open-R1-GRPO is a 1.5 billion parameter language model, fine-tuned from the base Qwen/Qwen2.5-1.5B-Instruct model. This model leverages the GRPO (Gradient-based Reward Policy Optimization) training method, which was introduced in the DeepSeekMath paper, indicating a focus on improving mathematical reasoning and general reasoning capabilities.

Key Capabilities

Enhanced Reasoning: Benefits from the GRPO training method, which is designed to push the limits of mathematical reasoning in open language models.
Instruction Following: Inherits instruction-following capabilities from its base model, Qwen2.5-1.5B-Instruct.
Extended Context: Supports a context length of 32768 tokens, allowing for processing longer inputs and maintaining coherence over extended interactions.

Good for

Mathematical Reasoning Tasks: Ideal for applications requiring strong logical and mathematical problem-solving.
Complex Instruction Following: Suitable for scenarios where precise adherence to instructions is critical.
Research and Experimentation: Provides a fine-tuned model for exploring the impact of GRPO on smaller, open-source architectures.

Overview

Overview

Key Capabilities

Good for

Full Model Card (README)