tencent/DRIVE-RL

TEXT GENERATIONConcurrency Cost:2Model Size:32.8BQuant:FP8Ctx Length:32kPublished:Nov 12, 2025Architecture:Transformer0.0K Cold

DRIVE-RL is a 32.8 billion parameter model developed by Tencent's Hunyuan Team, specifically designed for competitive code generation. It utilizes a Qwen2.5-32B base model and is enhanced through a novel two-stage Reinforcement Learning with Verifiable Reward (RLVR) process, focusing on data curation best practices. This model excels at solving challenging competitive programming problems, achieving state-of-the-art performance among similarly scaled models.

Loading preview...

Overview

DRIVE-RL is a 32.8 billion parameter model from Tencent's Hunyuan Team, specialized in competitive code generation. It builds upon a Qwen2.5-32B base model and employs a unique two-stage Reinforcement Learning (RL) pipeline with verifiable rewards. This approach addresses common issues in code generation like repetitive outputs and poor performance on difficult problems.

Key Capabilities & Training

The model's training pipeline involves:

  • Difficulty-Aware Supervised Fine-Tuning (SFT): The initial Qwen2.5-32B is fine-tuned with a dataset where hard competitive programming samples are duplicated to emphasize learning from challenging problems.
  • Two-Stage RL Process:
    • Stage 1 (Entropy Expansion): Uses a large, uniformly distributed problem set with moderate rollouts (8) and a shorter context (24k) to increase output diversity and prevent entropy collapse.
    • Stage 2 (Hard-Focus Curriculum): Updates on a small, high-quality set of challenging problems using Pre-GRPO with a large rollout budget (64-80 rollouts) to master difficult cases. This stage is crucial for significant performance gains on hard problems.

Performance & Differentiators

DRIVE-RL achieves state-of-the-art performance among models of similar scale in competitive code generation. It shows a +58.3% relative improvement on Codeforces OJ compared to its SFT baseline. Key findings highlight the importance of difficulty-aware training, entropy expansion, and large rollout budgets for tackling hard problems effectively. The model's strategy demonstrates strong scaling trends.

Good for

  • Competitive Programming: Excels at generating correct and efficient code for complex algorithmic challenges.
  • Code Generation Tasks: Particularly for scenarios requiring high accuracy and problem-solving capabilities on difficult inputs.
  • Research in RL for Code: Provides a strong baseline and methodology for further exploration in reinforcement learning applied to code generation.