X-Coder-RL-Qwen3-8B: Code Reasoning Foundation Model
IIGroup's X-Coder-RL-Qwen3-8B is an 8 billion parameter language model specifically engineered for advanced code reasoning. It is built on the IIGroup/X-Coder-SFT-Qwen3-8B base model and uniquely trained using the GRPO (Generalized Reinforcement Learning with Policy Optimization) method on a fully synthetic dataset, IIGroup/X-Coder-RL-40k. This specialized training approach, part of the X-Coder RLVR recipe, enables the model to excel in complex competitive programming scenarios.
Key Capabilities
- Superior Code Reasoning: Achieves strong performance on competitive programming benchmarks, as demonstrated by its average performance on LiveCodeBench v5 & v6.
- Reinforcement Learning Enhanced: Leverages RLVR (Reinforcement Learning from Virtual Rewards) on synthetic data for optimized code generation and problem-solving.
- Python Code Generation: Capable of generating functional Python code, as shown in examples for common algorithmic problems.
- High Context Length: Supports a context length of up to 32,768 tokens, beneficial for handling larger codebases or complex problem descriptions.
Good For
- Competitive Programming: Ideal for tasks requiring advanced algorithmic understanding and code generation.
- Code Generation: Generating solutions for programming challenges and general coding tasks.
- Code Reasoning Applications: Any use case demanding high-fidelity code understanding and logical problem-solving within a coding context.
For detailed training information and code, refer to the X-Coder GitHub repository.