DR.Kernel-8B: Specialized for GPU Kernel Optimization
hkust-nlp/drkernel-8b is an 8 billion parameter model built upon the Qwen3-8B architecture, uniquely specialized for generating and optimizing GPU kernels, particularly those using Triton. Developed by hkust-nlp, this model stands out by being trained for iterative optimization with execution feedback from KernelGYM, rather than just single-shot code generation.
Key Capabilities
- Triton Kernel Generation: Creates optimized Triton kernel implementations from PyTorch reference architectures.
- Iterative Refinement: Designed for multi-turn optimization, leveraging execution feedback to iteratively improve kernel performance and correctness.
- Agentic Code Refinement: Supports agentic workflows for code optimization under execution-based reward signals.
- Qwen3-8B Base: Benefits from the strong foundational capabilities of the Qwen3-8B model family.
Good for
- Kernel Generation Research: Ideal for academic and industrial research into automated GPU kernel development.
- Triton Kernel Optimization: Specifically suited for tasks requiring the generation and fine-tuning of high-performance Triton kernels.
- Benchmarking: Useful for evaluating kernel generation and optimization techniques using frameworks like KernelGYM and KernelBench.
- Multi-turn Code Agents: Applicable in scenarios where code needs to be iteratively improved based on compilation, correctness, and performance feedback.
This model is trained using a two-stage DR.Kernel pipeline, involving cold-start SFT on hkust-nlp/drkernel-coldstart-8k and multi-turn RL with TRLOO + MRS + PR + PRS methods, utilizing hkust-nlp/drkernel-rl-data and validated on hkust-nlp/drkernel-validation-data.