Name: tencent/DRIVE-RL API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: tencent

Overview

DRIVE-RL is a 32.8 billion parameter model from Tencent's Hunyuan Team, specialized in competitive code generation. It builds upon a Qwen2.5-32B base model and employs a unique two-stage Reinforcement Learning (RL) pipeline with verifiable rewards. This approach addresses common issues in code generation like repetitive outputs and poor performance on difficult problems.

Key Capabilities & Training

The model's training pipeline involves:

Difficulty-Aware Supervised Fine-Tuning (SFT): The initial Qwen2.5-32B is fine-tuned with a dataset where hard competitive programming samples are duplicated to emphasize learning from challenging problems.
Two-Stage RL Process:
- Stage 1 (Entropy Expansion): Uses a large, uniformly distributed problem set with moderate rollouts (8) and a shorter context (24k) to increase output diversity and prevent entropy collapse.
- Stage 2 (Hard-Focus Curriculum): Updates on a small, high-quality set of challenging problems using Pre-GRPO with a large rollout budget (64-80 rollouts) to master difficult cases. This stage is crucial for significant performance gains on hard problems.

Performance & Differentiators

DRIVE-RL achieves state-of-the-art performance among models of similar scale in competitive code generation. It shows a +58.3% relative improvement on Codeforces OJ compared to its SFT baseline. Key findings highlight the importance of difficulty-aware training, entropy expansion, and large rollout budgets for tackling hard problems effectively. The model's strategy demonstrates strong scaling trends.

Good for

Competitive Programming: Excels at generating correct and efficient code for complex algorithmic challenges.
Code Generation Tasks: Particularly for scenarios requiring high accuracy and problem-solving capabilities on difficult inputs.
Research in RL for Code: Provides a strong baseline and methodology for further exploration in reinforcement learning applied to code generation.

Overview

Overview

Key Capabilities & Training

Performance & Differentiators

Good for

Full Model Card (README)