hkust-nlp/drkernel-8b

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Feb 5, 2026Architecture:Transformer0.0K Cold

DR.Kernel-8B by hkust-nlp is an 8 billion parameter Qwen3-based model specifically designed for GPU kernel generation and optimization, particularly for Triton kernels. Unlike general-purpose code models, it is trained for iterative refinement with execution feedback from KernelGYM, enabling multi-turn optimization. Its primary use case is in kernel generation research, benchmarking, and agentic code refinement under execution-based reward, focusing on generating and optimizing `ModelNew` kernel implementations from PyTorch reference tasks.

Loading preview...

DR.Kernel-8B: Specialized for GPU Kernel Optimization

hkust-nlp/drkernel-8b is an 8 billion parameter model built upon the Qwen3-8B architecture, uniquely specialized for generating and optimizing GPU kernels, particularly those using Triton. Developed by hkust-nlp, this model stands out by being trained for iterative optimization with execution feedback from KernelGYM, rather than just single-shot code generation.

Key Capabilities

  • Triton Kernel Generation: Creates optimized Triton kernel implementations from PyTorch reference architectures.
  • Iterative Refinement: Designed for multi-turn optimization, leveraging execution feedback to iteratively improve kernel performance and correctness.
  • Agentic Code Refinement: Supports agentic workflows for code optimization under execution-based reward signals.
  • Qwen3-8B Base: Benefits from the strong foundational capabilities of the Qwen3-8B model family.

Good for

  • Kernel Generation Research: Ideal for academic and industrial research into automated GPU kernel development.
  • Triton Kernel Optimization: Specifically suited for tasks requiring the generation and fine-tuning of high-performance Triton kernels.
  • Benchmarking: Useful for evaluating kernel generation and optimization techniques using frameworks like KernelGYM and KernelBench.
  • Multi-turn Code Agents: Applicable in scenarios where code needs to be iteratively improved based on compilation, correctness, and performance feedback.

This model is trained using a two-stage DR.Kernel pipeline, involving cold-start SFT on hkust-nlp/drkernel-coldstart-8k and multi-turn RL with TRLOO + MRS + PR + PRS methods, utilizing hkust-nlp/drkernel-rl-data and validated on hkust-nlp/drkernel-validation-data.