IIGroup/X-Coder-RL-Qwen2.5-7B
IIGroup/X-Coder-RL-Qwen2.5-7B is a 7.6 billion parameter code reasoning foundation model developed by IIGroup. It is trained with Reinforcement Learning from Human Feedback (RLHF) on fully synthetic data, specifically optimized for competitive programming tasks. This model excels at generating strong reasoning performance for complex coding challenges, building upon the Qwen2.5 architecture.
Loading preview...
X-Coder-RL-Qwen2.5-7B Overview
X-Coder-RL-Qwen2.5-7B is a 7.6 billion parameter language model from IIGroup, specifically engineered for advanced code reasoning. It is built upon the IIGroup/X-Coder-SFT-Qwen2.5-7B base model and distinguishes itself through its training methodology: Reinforcement Learning with Value Regularization (RLVR) using the GRPO algorithm. This training leverages a fully synthetic dataset, IIGroup/X-Coder-RL-40k, to enhance its ability to solve competitive programming problems.
Key Capabilities
- Strong Code Reasoning: Achieves robust performance on complex coding challenges, as demonstrated by its results on LiveCodeBench v5.
- RL-Trained: Utilizes advanced reinforcement learning techniques (GRPO) on synthetic data for specialized optimization.
- Competitive Programming Focus: Designed to excel in scenarios requiring logical deduction and problem-solving within a coding context.
Recommended Use Cases
- Code Generation: Generating solutions for programming problems.
- Algorithmic Problem Solving: Assisting with or solving tasks found in competitive programming environments.
- Code Reasoning Tasks: Applications requiring deep understanding and logical manipulation of code.