DRIVE-SFT: Supervised Fine-Tuning for Competitive Code Generation
DRIVE-SFT, developed by the Hunyuan Team at Tencent, is the Supervised Fine-Tuning (SFT) component of the larger DRIVE pipeline, designed for competitive programming code generation. Built upon Qwen2.5-32B, this model incorporates a key innovation: Difficulty-Aware Sampling. During training, competitive programming prompts are categorized by difficulty, and hard samples are duplicated to force the model to focus on more challenging problems. This SFT phase also augments training with general-purpose coding and reasoning-intensive data to enhance overall capabilities.
Key Capabilities
- Enhanced Code Generation: Specifically fine-tuned to improve performance on competitive programming tasks.
- Difficulty-Aware Training: Prioritizes learning from harder coding problems through strategic data sampling.
- Foundation for RL: Serves as a robust base model before undergoing a two-stage Reinforcement Learning process (DRIVE-RL) for further performance gains.
Good For
- Developers and researchers interested in advanced code generation, particularly for competitive programming.
- As a strong baseline model for further fine-tuning or reinforcement learning in coding tasks.
- Exploring techniques for improving model performance on challenging, complex problems through data curation.