hkust-nlp/drkernel-14b

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:14BQuant:FP8Ctx Length:32kPublished:Feb 5, 2026Architecture:Transformer0.0K Warm

hkust-nlp/drkernel-14b is a 14.7 billion parameter Qwen3-based model developed by hkust-nlp, specifically designed for GPU kernel generation and optimization, particularly for Triton kernels. It excels at iterative optimization with execution feedback, distinguishing it from models focused solely on single-shot code generation. This model is primarily intended for research and development in Triton kernel optimization and multi-turn agentic code refinement.

Loading preview...

DR.Kernel-14B: Specialized for GPU Kernel Optimization

hkust-nlp/drkernel-14b is a 14.7 billion parameter model built upon the Qwen3-14B architecture, developed by hkust-nlp. Its core specialization lies in generating and iteratively optimizing GPU kernels, with a particular focus on Triton kernels within the DR.Kernel framework. Unlike general-purpose code generation models, DR.Kernel-14B is trained for multi-turn iterative refinement, leveraging execution feedback from KernelGYM to achieve optimized kernel implementations.

Key Capabilities & Training:

  • Iterative Optimization: Designed for multi-turn refinement of kernel code based on performance and correctness feedback.
  • Triton Kernel Generation: Specializes in producing optimized ModelNew kernel implementations from PyTorch reference tasks.
  • Reinforcement Learning: Trained using a two-stage pipeline involving cold-start Supervised Fine-Tuning (SFT) on hkust-nlp/drkernel-coldstart-8k and multi-turn Reinforcement Learning (RL) with methods like TRLOO, MRS, PR, and PRS on hkust-nlp/drkernel-rl-data.
  • Execution Feedback: Utilizes KernelGYM for compilation, correctness, performance, and profiling feedback during training.

Intended Use Cases:

  • Kernel Generation Research: Ideal for academic and industrial research into automated kernel optimization.
  • Triton Kernel Optimization: Best suited for tasks requiring iterative optimization of Triton kernels with execution feedback.
  • Agentic Code Refinement: Effective in multi-turn scenarios where code is refined based on execution-based rewards.

For optimal performance, users should adhere to the kernel-optimization prompt format used during training, which involves providing a PyTorch reference architecture and expecting an optimized ModelNew.