DR.Kernel-14B: Specialized for GPU Kernel Optimization
hkust-nlp/drkernel-14b is a 14.7 billion parameter model built upon the Qwen3-14B architecture, developed by hkust-nlp. Its core specialization lies in generating and iteratively optimizing GPU kernels, with a particular focus on Triton kernels within the DR.Kernel framework. Unlike general-purpose code generation models, DR.Kernel-14B is trained for multi-turn iterative refinement, leveraging execution feedback from KernelGYM to achieve optimized kernel implementations.
Key Capabilities & Training:
- Iterative Optimization: Designed for multi-turn refinement of kernel code based on performance and correctness feedback.
- Triton Kernel Generation: Specializes in producing optimized
ModelNew kernel implementations from PyTorch reference tasks. - Reinforcement Learning: Trained using a two-stage pipeline involving cold-start Supervised Fine-Tuning (SFT) on
hkust-nlp/drkernel-coldstart-8k and multi-turn Reinforcement Learning (RL) with methods like TRLOO, MRS, PR, and PRS on hkust-nlp/drkernel-rl-data. - Execution Feedback: Utilizes KernelGYM for compilation, correctness, performance, and profiling feedback during training.
Intended Use Cases:
- Kernel Generation Research: Ideal for academic and industrial research into automated kernel optimization.
- Triton Kernel Optimization: Best suited for tasks requiring iterative optimization of Triton kernels with execution feedback.
- Agentic Code Refinement: Effective in multi-turn scenarios where code is refined based on execution-based rewards.
For optimal performance, users should adhere to the kernel-optimization prompt format used during training, which involves providing a PyTorch reference architecture and expecting an optimized ModelNew.