Overview
IIGroup's X-Coder-SFT-Qwen3-8B is an 8 billion parameter code generation model built upon the Qwen3-8B-Base architecture. It has undergone Supervised Fine-Tuning (SFT) using a fully synthetic instruction dataset, IIGroup/X-Coder-SFT-376k, to specialize in competitive programming tasks. This model is intended as a robust base for subsequent Reinforcement Learning from Human Feedback (RLHF) or Reinforcement Learning from AI Feedback (RLAIF) training, with a related RL-trained version, IIGroup/X-Coder-RL-Qwen3-8B, achieving 64.0 on LiveCodeBench.
Key Capabilities
- Code Generation: Excels at generating code for competitive programming problems.
- Long Context Handling: Supports a maximum context length of 32768 tokens, allowing for complex problem descriptions and extensive code generation.
- Foundation Model: Serves as a strong SFT base for further performance enhancements through RLVR training.
Training Details
The model was trained using ms-swift with full parameter fine-tuning over 8 epochs. Key hyperparameters included a global batch size of 128, a learning rate of 5e-5, and bfloat16 precision. Training utilized DeepSpeed Zero3 Offload or Zero2 configurations. Packing was enabled, which accelerated training by 2x.
Good For
- Developers and researchers focused on competitive programming solutions.
- As a starting point for further fine-tuning or reinforcement learning in code generation.
- Applications requiring long context code understanding and generation.