AmberYifan/Qwen2.5-7B-Open-R1-Code-GRPO
AmberYifan/Qwen2.5-7B-Open-R1-Code-GRPO is a 7.6 billion parameter language model fine-tuned from Qwen/Qwen2.5-7B-Instruct. It specializes in code generation and problem-solving, specifically trained on the open-r1/verifiable-coding-problems-python dataset. This model utilizes the GRPO training method, enhancing its capabilities for mathematical reasoning and complex coding tasks, making it suitable for applications requiring robust code generation.
Loading preview...
Model Overview
AmberYifan/Qwen2.5-7B-Open-R1-Code-GRPO is a 7.6 billion parameter language model developed by AmberYifan. It is a fine-tuned version of the Qwen/Qwen2.5-7B-Instruct base model, specifically optimized for code generation and problem-solving.
Key Capabilities
- Code Generation: Specialized in generating code, particularly for verifiable coding problems in Python.
- Mathematical Reasoning: Incorporates the GRPO (Gradient-based Reward Policy Optimization) training method, which is known to enhance mathematical reasoning capabilities, as introduced in the DeepSeekMath paper.
- Fine-tuned Performance: Leverages the strong foundation of Qwen2.5-7B-Instruct and further refines its performance on coding tasks through targeted training.
Training Details
This model was trained using the TRL library on the open-r1/verifiable-coding-problems-python dataset. The integration of the GRPO method, detailed in the DeepSeekMath paper, signifies a focus on improving the model's ability to handle complex logical and mathematical problems within a coding context.
Good For
- Developers and researchers working on automated code generation.
- Applications requiring a language model with enhanced mathematical and logical reasoning for coding challenges.
- Tasks involving Python code generation and problem-solving based on verifiable specifications.