Name: hkust-nlp/drkernel-14b API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: hkust-nlp

DR.Kernel-14B: Specialized for GPU Kernel Optimization

hkust-nlp/drkernel-14b is a 14.7 billion parameter model built upon the Qwen3-14B architecture, developed by hkust-nlp. Its core specialization lies in generating and iteratively optimizing GPU kernels, with a particular focus on Triton kernels within the DR.Kernel framework. Unlike general-purpose code generation models, DR.Kernel-14B is trained for multi-turn iterative refinement, leveraging execution feedback from KernelGYM to achieve optimized kernel implementations.

Key Capabilities & Training:

Iterative Optimization: Designed for multi-turn refinement of kernel code based on performance and correctness feedback.
Triton Kernel Generation: Specializes in producing optimized ModelNew kernel implementations from PyTorch reference tasks.
Reinforcement Learning: Trained using a two-stage pipeline involving cold-start Supervised Fine-Tuning (SFT) on hkust-nlp/drkernel-coldstart-8k and multi-turn Reinforcement Learning (RL) with methods like TRLOO, MRS, PR, and PRS on hkust-nlp/drkernel-rl-data.
Execution Feedback: Utilizes KernelGYM for compilation, correctness, performance, and profiling feedback during training.

Intended Use Cases:

Kernel Generation Research: Ideal for academic and industrial research into automated kernel optimization.
Triton Kernel Optimization: Best suited for tasks requiring iterative optimization of Triton kernels with execution feedback.
Agentic Code Refinement: Effective in multi-turn scenarios where code is refined based on execution-based rewards.

For optimal performance, users should adhere to the kernel-optimization prompt format used during training, which involves providing a PyTorch reference architecture and expecting an optimized ModelNew.

Overview

DR.Kernel-14B: Specialized for GPU Kernel Optimization

Key Capabilities & Training:

Intended Use Cases:

Full Model Card (README)