Name: AmberYifan/Qwen2.5-7B-Open-R1-Code-GRPO API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: AmberYifan

Model Overview

AmberYifan/Qwen2.5-7B-Open-R1-Code-GRPO is a 7.6 billion parameter language model developed by AmberYifan. It is a fine-tuned version of the Qwen/Qwen2.5-7B-Instruct base model, specifically optimized for code generation and problem-solving.

Key Capabilities

Code Generation: Specialized in generating code, particularly for verifiable coding problems in Python.
Mathematical Reasoning: Incorporates the GRPO (Gradient-based Reward Policy Optimization) training method, which is known to enhance mathematical reasoning capabilities, as introduced in the DeepSeekMath paper.
Fine-tuned Performance: Leverages the strong foundation of Qwen2.5-7B-Instruct and further refines its performance on coding tasks through targeted training.

Training Details

This model was trained using the TRL library on the open-r1/verifiable-coding-problems-python dataset. The integration of the GRPO method, detailed in the DeepSeekMath paper, signifies a focus on improving the model's ability to handle complex logical and mathematical problems within a coding context.

Good For

Developers and researchers working on automated code generation.
Applications requiring a language model with enhanced mathematical and logical reasoning for coding challenges.
Tasks involving Python code generation and problem-solving based on verifiable specifications.

Overview

Model Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)